Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
RSC Med Chem ; 15(3): 1015-1021, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38516605

RESUMO

High throughput and rapid biological evaluation of small molecules is an essential factor in drug discovery and development. Direct-to-biology (D2B), whereby compound purification is foregone, has emerged as a viable technique in time efficient screening, specifically for PROTAC design and biological evaluation. However, one notable limitation is the prerequisite of high yielding reactions to ensure the desired compound is indeed the compound responsible for biological activity. Herein, we report a machine learning based yield-assay deconfounder capable of deconvoluting low yield from low potency to identify false negatives. We validated this approach by identifying promising SARS-CoV-2 main protease inhibitors with nanomolar activity that rivaled potency observed from the standard D2B workflow. Furthermore, we show how our framework can be utilized in a broad, in silico screen to produce compounds of similar potency as a D2B assay.

2.
Nat Chem ; 16(4): 633-643, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38168924

RESUMO

High-throughput experimentation (HTE) has the potential to improve our understanding of organic chemistry by systematically interrogating reactivity across diverse chemical spaces. Notable bottlenecks include few publicly available large-scale datasets and the need for facile interpretation of these data's hidden chemical insights. Here we report the development of a high-throughput experimentation analyser, a robust and statistically rigorous framework, which is applicable to any HTE dataset regardless of size, scope or target reaction outcome, which yields interpretable correlations between starting material(s), reagents and outcomes. We improve the HTE data landscape with the disclosure of 39,000+ previously proprietary HTE reactions that cover a breadth of chemistry, including cross-coupling reactions and chiral salt resolutions. The high-throughput experimentation analyser was validated on cross-coupling and hydrogenation datasets, showcasing the elucidation of statistically significant hidden relationships between reaction components and outcomes, as well as highlighting areas of dataset bias and the specific reaction spaces that necessitate further investigation.

3.
Nat Commun ; 15(1): 426, 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38225239

RESUMO

Structural diversification of lead molecules is a key component of drug discovery to explore chemical space. Late-stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches. We report the development of an approach that combines a message passing neural network and 13C NMR-based transfer learning to predict the atom-wise probabilities of functionalization for Minisci and P450-based functionalizations. We validated our model both retrospectively and with a series of prospective experiments, showing that it accurately predicts the outcomes of Minisci-type and P450 transformations and outperforms the well-established Fukui-based reactivity indices and other machine learning reactivity-based algorithms.


Assuntos
Descoberta de Drogas , Redes Neurais de Computação , Estudos Prospectivos , Estudos Retrospectivos , Descoberta de Drogas/métodos , Aprendizado de Máquina
4.
Science ; 382(6671): eabo7201, 2023 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-37943932

RESUMO

We report the results of the COVID Moonshot, a fully open-science, crowdsourced, and structure-enabled drug discovery campaign targeting the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease. We discovered a noncovalent, nonpeptidic inhibitor scaffold with lead-like properties that is differentiated from current main protease inhibitors. Our approach leveraged crowdsourcing, machine learning, exascale molecular simulations, and high-throughput structural biology and chemistry. We generated a detailed map of the structural plasticity of the SARS-CoV-2 main protease, extensive structure-activity relationships for multiple chemotypes, and a wealth of biochemical activity data. All compound designs (>18,000 designs), crystallographic data (>490 ligand-bound x-ray structures), assay data (>10,000 measurements), and synthesized molecules (>2400 compounds) for this campaign were shared rapidly and openly, creating a rich, open, and intellectual property-free knowledge base for future anticoronavirus drug discovery.


Assuntos
Tratamento Farmacológico da COVID-19 , Proteases 3C de Coronavírus , Inibidores de Protease de Coronavírus , Descoberta de Drogas , SARS-CoV-2 , Humanos , Proteases 3C de Coronavírus/antagonistas & inibidores , Proteases 3C de Coronavírus/química , Simulação de Acoplamento Molecular , Inibidores de Protease de Coronavírus/síntese química , Inibidores de Protease de Coronavírus/química , Inibidores de Protease de Coronavírus/farmacologia , Relação Estrutura-Atividade , Cristalografia por Raios X
5.
Nat Rev Drug Discov ; 22(7): 585-603, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37173515

RESUMO

During the coronavirus disease 2019 (COVID-19) pandemic, a wave of rapid and collaborative drug discovery efforts took place in academia and industry, culminating in several therapeutics being discovered, approved and deployed in a 2-year time frame. This article summarizes the collective experience of several pharmaceutical companies and academic collaborations that were active in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antiviral discovery. We outline our opinions and experiences on key stages in the small-molecule drug discovery process: target selection, medicinal chemistry, antiviral assays, animal efficacy and attempts to pre-empt resistance. We propose strategies that could accelerate future efforts and argue that a key bottleneck is the lack of quality chemical probes around understudied viral targets, which would serve as a starting point for drug discovery. Considering the small size of the viral proteome, comprehensively building an arsenal of probes for proteins in viruses of pandemic concern is a worthwhile and tractable challenge for the community.


Assuntos
COVID-19 , Animais , Antivirais/farmacologia , Antivirais/uso terapêutico , SARS-CoV-2 , Descoberta de Drogas , Pandemias
6.
Proc Natl Acad Sci U S A ; 120(11): e2214168120, 2023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-36877844

RESUMO

A common challenge in drug design pertains to finding chemical modifications to a ligand that increases its affinity to the target protein. An underutilized advance is the increase in structural biology throughput, which has progressed from an artisanal endeavor to a monthly throughput of hundreds of different ligands against a protein in modern synchrotrons. However, the missing piece is a framework that turns high-throughput crystallography data into predictive models for ligand design. Here, we designed a simple machine learning approach that predicts protein-ligand affinity from experimental structures of diverse ligands against a single protein paired with biochemical measurements. Our key insight is using physics-based energy descriptors to represent protein-ligand complexes and a learning-to-rank approach that infers the relevant differences between binding modes. We ran a high-throughput crystallography campaign against the SARS-CoV-2 main protease (MPro), obtaining parallel measurements of over 200 protein-ligand complexes and their binding activities. This allows us to design one-step library syntheses which improved the potency of two distinct micromolar hits by over 10-fold, arriving at a noncovalent and nonpeptidomimetic inhibitor with 120 nM antiviral efficacy. Crucially, our approach successfully extends ligands to unexplored regions of the binding pocket, executing large and fruitful moves in chemical space with simple chemistry.


Assuntos
COVID-19 , Humanos , Ligantes , SARS-CoV-2 , Antivirais , Biologia
7.
Chem Sci ; 13(45): 13541-13551, 2022 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-36507171

RESUMO

Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset.

8.
Nat Commun ; 13(1): 4806, 2022 08 16.
Artigo em Inglês | MEDLINE | ID: mdl-35974010

RESUMO

Accurate forecasting of lithium-ion battery performance is essential for easing consumer concerns about the safety and reliability of electric vehicles. Most research on battery health prognostics focuses on the research and development setting where cells are subjected to the same usage patterns. However, in practical operation, there is great variability in use across cells and cycles, thus making forecasting challenging. To address this challenge, here we propose a combination of electrochemical impedance spectroscopy measurements with probabilistic machine learning methods. Making use of a dataset of 88 commercial lithium-ion coin cells generated via multistage charging and discharging (with currents randomly changed between cycles), we show that future discharge capacities can be predicted with calibrated uncertainties, given the future cycling protocol and a single electrochemical impedance spectroscopy measurement made immediately before charging, and without any knowledge of usage history. The results are robust to cell manufacturer, the distribution of cycling protocols, and temperature. The research outcome also suggests that battery health is better quantified by a multidimensional vector rather than a scalar state of health.


Assuntos
Fontes de Energia Elétrica , Lítio , Impedância Elétrica , Eletrodos , Íons , Lítio/química , Reprodutibilidade dos Testes
9.
Sci Adv ; 8(30): eabn4117, 2022 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-35895811

RESUMO

A fundamental challenge in materials science pertains to elucidating the relationship between stoichiometry, stability, structure, and property. Recent advances have shown that machine learning can be used to learn such relationships, allowing the stability and functional properties of materials to be accurately predicted. However, most of these approaches use atomic coordinates as input and are thus bottlenecked by crystal structure identification when investigating previously unidentified materials. Our approach solves this bottleneck by coarse-graining the infinite search space of atomic coordinates into a combinatorially enumerable search space. The key idea is to use Wyckoff representations, coordinate-free sets of symmetry-related positions in a crystal, as the input to a machine learning model. Our model demonstrates exceptionally high precision in finding unknown theoretically stable materials, identifying 1569 materials that lie below the known convex hull of previously calculated materials from just 5675 ab initio calculations. Our approach opens up fundamental advances in computational materials discovery.

10.
Nat Rev Chem ; 6(4): 287-295, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35783295

RESUMO

One aspirational goal of computational chemistry is to predict potent and drug-like binders for any protein, such that only those that bind are synthesized. In this Roadmap, we describe the launch of Critical Assessment of Computational Hit-finding Experiments (CACHE), a public benchmarking project to compare and improve small molecule hit-finding algorithms through cycles of prediction and experimental testing. Participants will predict small molecule binders for new and biologically relevant protein targets representing different prediction scenarios. Predicted compounds will be tested rigorously in an experimental hub, and all predicted binders as well as all experimental screening data, including the chemical structures of experimentally tested compounds, will be made publicly available, and not subject to any intellectual property restrictions. The ability of a range of computational approaches to find novel binders will be evaluated, compared, and openly published. CACHE will launch 3 new benchmarking exercises every year. The outcomes will be better prediction methods, new small molecule binders for target proteins of importance for fundamental biology or drug discovery, and a major technological step towards achieving the goal of Target 2035, a global initiative to identify pharmacological probes for all human proteins.

11.
ChemMedChem ; 17(7): e202100641, 2022 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-35191598

RESUMO

The pentafluorosulfanyl (-SF5 ) functional group is of increasing interest as a bioisostere in medicinal chemistry. A library of SF5 -containing compounds, including amide, isoxazole, and oxindole derivatives, was synthesised using a range of solution-based and solventless methods, including microwave and ball-mill techniques. The library was tested against targets including human dihydroorotate dehydrogenase (HDHODH). A subsequent focused approach led to synthesis of analogues of the clinically used disease modifying anti-rheumatic drugs (DMARDs), Teriflunomide and Leflunomide, considered for potential COVID-19 use, where SF5 bioisostere deployment led to improved inhibition of HDHODH compared with the parent drugs. The results demonstrate the utility of the SF5 group in medicinal chemistry.


Assuntos
Química Farmacêutica , Di-Hidro-Orotato Desidrogenase , Amidas , Di-Hidro-Orotato Desidrogenase/antagonistas & inibidores , Humanos
12.
Soft Matter ; 17(21): 5393-5400, 2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-33969369

RESUMO

A key challenge for soft materials design and coarse-graining simulations is determining interaction potentials between components that give rise to desired condensed-phase structures. In theory, the Ornstein-Zernike equation provides an elegant framework for solving this inverse problem. Pioneering work in liquid state theory derived analytical closures for the framework. However, these analytical closures are approximations, valid only for specific classes of interaction potentials. In this work, we combine the physics of liquid state theory with machine learning to infer a closure directly from simulation data. The resulting closure is more accurate than commonly used closures across a broad range of interaction potentials.

13.
Chem Commun (Camb) ; 57(48): 5909-5912, 2021 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-34008627

RESUMO

The SARS-CoV-2 main viral protease (Mpro) is an attractive target for antivirals given its distinctiveness from host proteases, essentiality in the viral life cycle and conservation across coronaviridae. We launched the COVID Moonshot initiative to rapidly develop patent-free antivirals with open science and open data. Here we report the use of machine learning for de novo design, coupled with synthesis route prediction, in our campaign. We discover novel chemical scaffolds active in biochemical and live virus assays, synthesized with model generated routes.


Assuntos
Antivirais/farmacologia , Proteases 3C de Coronavírus/antagonistas & inibidores , Inibidores de Cisteína Proteinase/farmacologia , SARS-CoV-2/enzimologia , Antivirais/síntese química , Coronavirus Humano OC43/efeitos dos fármacos , Inibidores de Cisteína Proteinase/síntese química , Desenho de Fármacos , Descoberta de Drogas/métodos , Aprendizado de Máquina , Testes de Sensibilidade Microbiana
14.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33827920

RESUMO

Intracellular phase separation of proteins into biomolecular condensates is increasingly recognized as a process with a key role in cellular compartmentalization and regulation. Different hypotheses about the parameters that determine the tendency of proteins to form condensates have been proposed, with some of them probed experimentally through the use of constructs generated by sequence alterations. To broaden the scope of these observations, we established an in silico strategy for understanding on a global level the associations between protein sequence and phase behavior and further constructed machine-learning models for predicting protein liquid-liquid phase separation (LLPS). Our analysis highlighted that LLPS-prone proteins are more disordered, less hydrophobic, and of lower Shannon entropy than sequences in the Protein Data Bank or the Swiss-Prot database and that they show a fine balance in their relative content of polar and hydrophobic residues. To further learn in a hypothesis-free manner the sequence features underpinning LLPS, we trained a neural network-based language model and found that a classifier constructed on such embeddings learned the underlying principles of phase behavior at a comparable accuracy to a classifier that used knowledge-based features. By combining knowledge-based features with unsupervised embeddings, we generated an integrated model that distinguished LLPS-prone sequences both from structured proteins and from unstructured proteins with a lower LLPS propensity and further identified such sequences from the human proteome at a high accuracy. These results provide a platform rooted in molecular principles for understanding protein phase behavior. The predictor, termed DeePhase, is accessible from https://deephase.ch.cam.ac.uk/.


Assuntos
Sequência de Aminoácidos , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Animais , Humanos , Interações Hidrofóbicas e Hidrofílicas
15.
J Chem Phys ; 154(13): 134902, 2021 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-33832269

RESUMO

Electrolytes play an important role in a plethora of applications ranging from energy storage to biomaterials. Notwithstanding this, the structure of concentrated electrolytes remains enigmatic. Many theoretical approaches attempt to model the concentrated electrolyte by introducing the idea of ion pairs, with ions either being tightly "paired" with a counter-ion or "free" to screen charge. In this study, we reframe the problem into the language of computational statistics and test the null hypothesis that all ions share the same local environment. Applying the framework to molecular dynamics simulations, we find that this null hypothesis is not supported by data. Our statistical technique suggests the presence of two distinct local ionic environments at intermediate concentrations, whose differences surprisingly originate in like charge correlations rather than unlike charge attraction. Through considering the effect of these "aggregated" and "non-aggregated" states on bulk properties including effective ion concentration and dielectric constant, we identify a scaling relation between the effective screening length and theoretical Debye length, which applies across different dielectric constants and ion concentrations.

16.
Nat Commun ; 12(1): 1695, 2021 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-33727552

RESUMO

Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.

17.
Nat Commun ; 11(1): 6280, 2020 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-33293567

RESUMO

Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure - therefore only applicable to materials with already characterised structures - or structure-agnostic fixed-length representations hand-engineered from the stoichiometry. We develop a machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data. Our key insight is to treat the stoichiometric formula as a dense weighted graph between elements. Compared to the state of the art for structure-agnostic methods, our approach achieves lower errors with less data.

18.
Proc Natl Acad Sci U S A ; 117(36): 21857-21864, 2020 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-32843349

RESUMO

The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions.

19.
J Chem Inf Model ; 60(10): 4449-4456, 2020 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-32786696

RESUMO

The development of molecular descriptors is a central challenge in cheminformatics. Most approaches use algorithms that extract atomic environments or end-to-end machine learning. However, a looming question is that how do these approaches compare with the critical eye of trained chemists. The CAS fingerprint engages expert chemists to curate chemical motifs, which they deem could influence bioactivity. In this paper, we benchmark the CAS fingerprint against commonly used fingerprints using a well-established benchmark set of 88 targets. We show that the CAS fingerprint outperforms most of the commonly used molecular fingerprints. Analysis of the CAS fingerprint reveals that experts tend to select features that are rarely reported in the literature, though not all rare features are selected. Our analysis also shows that the CAS fingerprint provides a different source of information compared to other commonly used fingerprints. These results suggest that anthropomorphic insights do have predictive power and highlight the importance of a chemist-in-the-loop approach in the era of machine learning.


Assuntos
Algoritmos , Aprendizado de Máquina , Quimioinformática
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...