Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34524425

RESUMO

To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.


Assuntos
Neoplasias , Algoritmos , Linhagem Celular , Humanos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Redes Neurais de Computação
2.
J Chem Inf Model ; 63(5): 1438-1453, 2023 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-36808989

RESUMO

Direct-acting antivirals for the treatment of the COVID-19 pandemic caused by the SARS-CoV-2 virus are needed to complement vaccination efforts. Given the ongoing emergence of new variants, automated experimentation, and active learning based fast workflows for antiviral lead discovery remain critical to our ability to address the pandemic's evolution in a timely manner. While several such pipelines have been introduced to discover candidates with noncovalent interactions with the main protease (Mpro), here we developed a closed-loop artificial intelligence pipeline to design electrophilic warhead-based covalent candidates. This work introduces a deep learning-assisted automated computational workflow to introduce linkers and an electrophilic "warhead" to design covalent candidates and incorporates cutting-edge experimental techniques for validation. Using this process, promising candidates in the library were screened, and several potential hits were identified and tested experimentally using native mass spectrometry and fluorescence resonance energy transfer (FRET)-based screening assays. We identified four chloroacetamide-based covalent inhibitors of Mpro with micromolar affinities (KI of 5.27 µM) using our pipeline. Experimentally resolved binding modes for each compound were determined using room-temperature X-ray crystallography, which is consistent with the predicted poses. The induced conformational changes based on molecular dynamics simulations further suggest that the dynamics may be an important factor to further improve selectivity, thereby effectively lowering KI and reducing toxicity. These results demonstrate the utility of our modular and data-driven approach for potent and selective covalent inhibitor discovery and provide a platform to apply it to other emerging targets.


Assuntos
COVID-19 , Hepatite C Crônica , Humanos , SARS-CoV-2/metabolismo , Antivirais/farmacologia , Pandemias , Inteligência Artificial , Inibidores de Proteases/farmacologia , Simulação de Acoplamento Molecular
3.
Int J High Perform Comput Appl ; 37(1): 28-44, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36647365

RESUMO

We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized.

4.
J Chem Inf Model ; 62(1): 116-128, 2022 01 10.
Artigo em Inglês | MEDLINE | ID: mdl-34793155

RESUMO

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.


Assuntos
COVID-19 , Inibidores de Proteases , Antivirais , Proteases 3C de Coronavírus , Humanos , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Ácido Orótico/análogos & derivados , Piperazinas , SARS-CoV-2
5.
Int J High Perform Comput Appl ; 36(5-6): 603-623, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38464362

RESUMO

The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical compounds is a pathway to treating COVID-19. Conventional tools, e.g., cryo-electron microscopy and all-atom molecular dynamics (AAMD), do not provide sufficiently high resolution or timescale to capture important dynamics of this molecular machine. Consequently, we develop an innovative workflow that bridges the gap between these resolutions, using mesoscale fluctuating finite element analysis (FFEA) continuum simulations and a hierarchy of AI-methods that continually learn and infer features for maintaining consistency between AAMD and FFEA simulations. We leverage a multi-site distributed workflow manager to orchestrate AI, FFEA, and AAMD jobs, providing optimal resource utilization across HPC centers. Our study provides unprecedented access to study the SARS-CoV-2 RTC machinery, while providing general capability for AI-enabled multi-resolution simulations at scale.

6.
BMC Bioinformatics ; 22(1): 252, 2021 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-34001007

RESUMO

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.


Assuntos
Neoplasias , Preparações Farmacêuticas , Linhagem Celular , Curva de Aprendizado , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Estudos Prospectivos
7.
Int J High Perform Comput Appl ; 35(5): 432-451, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38603008

RESUMO

We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.

8.
Front Med (Lausanne) ; 10: 1086097, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36873878

RESUMO

Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.

9.
Sci Rep ; 13(1): 2105, 2023 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-36747041

RESUMO

Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/metabolismo , Inteligência Artificial , Simulação de Acoplamento Molecular , Ligantes , Proteínas/metabolismo
10.
Cancers (Basel) ; 16(1)2023 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-38201477

RESUMO

Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.

11.
Science ; 382(6671): eabo7201, 2023 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-37943932

RESUMO

We report the results of the COVID Moonshot, a fully open-science, crowdsourced, and structure-enabled drug discovery campaign targeting the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease. We discovered a noncovalent, nonpeptidic inhibitor scaffold with lead-like properties that is differentiated from current main protease inhibitors. Our approach leveraged crowdsourcing, machine learning, exascale molecular simulations, and high-throughput structural biology and chemistry. We generated a detailed map of the structural plasticity of the SARS-CoV-2 main protease, extensive structure-activity relationships for multiple chemotypes, and a wealth of biochemical activity data. All compound designs (>18,000 designs), crystallographic data (>490 ligand-bound x-ray structures), assay data (>10,000 measurements), and synthesized molecules (>2400 compounds) for this campaign were shared rapidly and openly, creating a rich, open, and intellectual property-free knowledge base for future anticoronavirus drug discovery.


Assuntos
Tratamento Farmacológico da COVID-19 , Proteases 3C de Coronavírus , Inibidores de Protease de Coronavírus , Descoberta de Drogas , SARS-CoV-2 , Humanos , Proteases 3C de Coronavírus/antagonistas & inibidores , Proteases 3C de Coronavírus/química , Simulação de Acoplamento Molecular , Inibidores de Protease de Coronavírus/síntese química , Inibidores de Protease de Coronavírus/química , Inibidores de Protease de Coronavírus/farmacologia , Relação Estrutura-Atividade , Cristalografia por Raios X
12.
Methods Mol Biol ; 2390: 301-319, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34731475

RESUMO

Ultrahigh-throughput virtual screening (uHTVS) is an emerging field linking together classical docking techniques with high-throughput AI methods. We outline mechanistic docking models' goals and successes. We present different AI accelerated workflows for uHTVS, mainly through surrogate docking models. We showcase a novel feature representation technique, molecular depictions (images), as a surrogate model for docking. Along with a discussion on analyzing screens using regression enrichment surfaces at the tens of billion scale, we outline a future for uHTVS screening pipelines with deep learning.


Assuntos
Aprendizado Profundo , Ligantes , Simulação de Acoplamento Molecular , Proteínas
13.
Patterns (N Y) ; 3(2): 100446, 2022 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-35199069

RESUMO

Artificial intelligence (AI) for science is a growing area of interdisciplinary computer science research focused on solving some of the most pressing global issues. While many cite AI's technical advances as the innovative force of the endeavor, I argue that interdisciplinarity, democratization, and cogent justification toward global citizens are driving forces to be fostered in the program's development.

14.
bioRxiv ; 2022 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-36451881

RESUMO

We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLMs represents one of the first whole genome scale foundation models which can generalize to other prediction tasks. We demonstrate scaling of GenSLMs on GPU-based supercomputers and AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with a sustained performance of 121 PFLOPS in mixed precision and peak of 850 PFLOPS. We present initial scientific insights from examining GenSLMs in tracking evolutionary dynamics of SARS-CoV-2, paving the path to realizing this on large biological data.

15.
J Med Chem ; 64(23): 17366-17383, 2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34705466

RESUMO

Creating small-molecule antivirals specific for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins is crucial to battle coronavirus disease 2019 (COVID-19). SARS-CoV-2 main protease (Mpro) is an established drug target for the design of protease inhibitors. We performed a structure-activity relationship (SAR) study of noncovalent compounds that bind in the enzyme's substrate-binding subsites S1 and S2, revealing structural, electronic, and electrostatic determinants of these sites. The study was guided by the X-ray/neutron structure of Mpro complexed with Mcule-5948770040 (compound 1), in which protonation states were directly visualized. Virtual reality-assisted structure analysis and small-molecule building were employed to generate analogues of 1. In vitro enzyme inhibition assays and room-temperature X-ray structures demonstrated the effect of chemical modifications on Mpro inhibition, showing that (1) maintaining correct geometry of an inhibitor's P1 group is essential to preserve the hydrogen bond with the protonated His163; (2) a positively charged linker is preferred; and (3) subsite S2 prefers nonbulky modestly electronegative groups.


Assuntos
Proteases 3C de Coronavírus , Inibidores de Proteases , Ácido Orótico/análogos & derivados , Piperazinas , Conformação Proteica , Eletricidade Estática
16.
Interface Focus ; 11(6): 20210018, 2021 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-34956592

RESUMO

The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers.

17.
bioRxiv ; 2021 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-34816263

RESUMO

We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus ob-scure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized. ACM REFERENCE FORMAT: Abigail Dommer 1† , Lorenzo Casalino 1† , Fiona Kearns 1† , Mia Rosenfeld 1 , Nicholas Wauer 1 , Surl-Hee Ahn 1 , John Russo, 2 Sofia Oliveira 3 , Clare Morris 1 , AnthonyBogetti 4 , AndaTrifan 5,6 , Alexander Brace 5,7 , TerraSztain 1,8 , Austin Clyde 5,7 , Heng Ma 5 , Chakra Chennubhotla 4 , Hyungro Lee 9 , Matteo Turilli 9 , Syma Khalid 10 , Teresa Tamayo-Mendoza 11 , Matthew Welborn 11 , Anders Christensen 11 , Daniel G. A. Smith 11 , Zhuoran Qiao 12 , Sai Krishna Sirumalla 11 , Michael O'Connor 11 , Frederick Manby 11 , Anima Anandkumar 12,13 , David Hardy 6 , James Phillips 6 , Abraham Stern 13 , Josh Romero 13 , David Clark 13 , Mitchell Dorrell 14 , Tom Maiden 14 , Lei Huang 15 , John McCalpin 15 , Christo- pherWoods 3 , Alan Gray 13 , MattWilliams 3 , Bryan Barker 16 , HarindaRajapaksha 16 , Richard Pitts 16 , Tom Gibbs 13 , John Stone 6 , Daniel Zuckerman 2 *, Adrian Mulholland 3 *, Thomas MillerIII 11,12 *, ShantenuJha 9 *, Arvind Ramanathan 5 *, Lillian Chong 4 *, Rommie Amaro 1 *. 2021. #COVIDisAirborne: AI-Enabled Multiscale Computational Microscopy ofDeltaSARS-CoV-2 in a Respiratory Aerosol. In Supercomputing '21: International Conference for High Perfor-mance Computing, Networking, Storage, and Analysis . ACM, New York, NY, USA, 14 pages. https://doi.org/finalDOI.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA