RESUMO
In the realm of biomedical research, understanding the intricate structure of proteins is crucial, as these structures determine how proteins function within our bodies and interact with potential drugs. Traditionally, methods like X-ray crystallography and cryo-electron microscopy have been used to unravel these structures, but they are often challenging, time-consuming and costly. Recently, a breakthrough in computational biology has emerged with the development of deep learning algorithms capable of predicting protein structures based on their amino acid sequences (Jumper, J., et al. Nature 2021, 596, 583. Lane, T. J. Nature Methods 2023, 20, 170. Kryshtafovych, A., et al. Proteins: Structure, Function and Bioinformatics 2021, 89, 1607). This study focuses on predicting the dynamic changes that proteins undergo upon ligand binding, specifically when they bind to allosteric sites, i.e. a pocket different from the active site. Allosteric modulators are particularly important for drug discovery, as they open new avenues for designing drugs that can target proteins more effectively and with fewer side effects (Nussinov, R.; Tsai, C. J. Cell 2013, 153, 293). To study this, we curated a data set of 578 X-ray structures comprised of proteins displaying orthosteric and allosteric binding as well as a general framework to evaluate deep learning-based structure prediction methods. Our findings demonstrate the potential and current limitations of deep learning methods, such as AlphaFold2 (Jumper, J., et al. Nature 2021, 596, 583), NeuralPLexer (Qiao, Z., et al. Nat Mach Intell 2024, 6, 195), and RoseTTAFold All-Atom (Krishna, R., et al. Science 2024, 384, eadl2528) to predict not just static protein structures but also the dynamic conformational changes. Herein we show that predicting the allosteric induce-fit conformation still poses a challenge to deep learning methods as they more accurately predict the orthosteric bound conformation compared to the allosteric induce fit conformation. For AlphaFold2, we observed that conformational diversity, and sampling between the apo and holo state could be increased by modifying the MSA depth, but this did not enhance the ability to generate conformations close to the allosteric induced-fit conformation. To further support advancements in protein structure prediction field, the curated data set and evaluation framework are made publicly available.
RESUMO
Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization by training on pairs of similar molecules. This provides a starting point for generating similar molecules to a given input molecule, but has limited flexibility regarding user-defined property profiles. Here, we evaluate the effect of reinforcement learning on transformer-based molecular generative models. The generative model can be considered as a pre-trained model with knowledge of the chemical space close to an input compound, while reinforcement learning can be viewed as a tuning phase, steering the model towards chemical space with user-specific desirable properties. The evaluation of two distinct tasks-molecular optimization and scaffold discovery-suggest that reinforcement learning could guide the transformer-based generative model towards the generation of more compounds of interest. Additionally, the impact of pre-trained models, learning steps and learning rates are investigated.Scientific contributionOur study investigates the effect of reinforcement learning on a transformer-based generative model initially trained for generating molecules similar to starting molecules. The reinforcement learning framework is applied to facilitate multiparameter optimisation of starting molecules. This approach allows for more flexibility for optimizing user-specific property profiles and helps finding more ideas of interest.
RESUMO
Design-Make-Test-Analyse (DMTA) is the discovery cycle through which molecules are designed, synthesised, and assayed to produce data that in turn are analysed to inform the next iteration. The process is repeated until viable drug candidates are identified, often requiring many cycles before reaching a sweet spot. The advent of artificial intelligence (AI) and cloud computing presents an opportunity to innovate drug discovery to reduce the number of cycles needed to yield a candidate. Here, we present the Predictive Insight Platform (PIP), a cloud-native modelling platform developed at AstraZeneca. The impact of PIP in each step of DMTA, as well as its architecture, integration, and usage, are discussed and used to provide insights into the future of drug discovery.
Assuntos
Inteligência Artificial , Bioensaio , Descoberta de DrogasRESUMO
Siamese networks, representing a novel class of neural networks, consist of two identical subnetworks sharing weights but receiving different inputs. Here we present a similarity-based pairing method for generating compound pairs to train Siamese neural networks for regression tasks. In comparison with the conventional exhaustive pairing, it reduces the algorithm complexity from O(n2) to O(n). It also results in a better prediction performance consistently on the three physicochemical datasets, using a multilayer perceptron with the circular fingerprint as a proof of concept. We further include into a Siamese neural network the transformer-based Chemformer, which extracts task-specific features from the simplified molecular-input line-entry system representation of compounds. Additionally, we propose a means to measure the prediction uncertainty by utilizing the variance in predictions from a set of reference compounds. Our results demonstrate that the high prediction accuracy correlates with the high confidence. Finally, we investigate implications of the similarity property principle in machine learning.
RESUMO
Understanding allosteric regulation in biomolecules is of great interest to pharmaceutical research and computational methods emerged during the last decades to characterize allosteric coupling. However, the prediction of allosteric sites in a protein structure remains a challenging task. Here, we integrate local binding site information, coevolutionary information, and information on dynamic allostery into a structure-based three-parameter model to identify potentially hidden allosteric sites in ensembles of protein structures with orthosteric ligands. When tested on five allosteric proteins (LFA-1, p38-α, GR, MAT2A, and BCKDK), the model successfully ranked all known allosteric pockets in the top three positions. Finally, we identified a novel druggable site in MAT2A confirmed by X-ray crystallography and SPR and a hitherto unknown druggable allosteric site in BCKDK validated by biochemical and X-ray crystallography analyses. Our model can be applied in drug discovery to identify allosteric pockets.
RESUMO
It is axiomatic in medicinal chemistry that optimization of the potency of a small molecule at a macromolecular target requires complementarity between the ligand and target. In order to minimize the conformational penalty on binding, both enthalpically and entropically, it is therefore preferred to have the ligand preorganized in the bound conformation. In this Perspective, we highlight the role of allylic strain in controlling conformational preferences. Allylic strain was originally described for carbon-based allylic systems, but the same principles apply to other types of structure with sp2 or pseudo-sp2 arrangements. These systems include benzylic (including heteroaryl methyl) positions, amides, N-aryl groups, aryl ethers, and nucleotides. We have derived torsion profiles from small molecule X-ray structures for these systems. Through multiple examples, we show how these effects have been applied in drug discovery and how they can be used prospectively to influence conformation in the design process.
Assuntos
Química Farmacêutica , Descoberta de Drogas , Ligantes , Conformação Molecular , Amidas/químicaRESUMO
Matched molecular pairs (MMPs) are nowadays a commonly applied concept in drug design. They are used in many computational tools for structure-activity relationship analysis, biological activity prediction, or optimization of physicochemical properties. However, until now it has not been shown in a rigorous way that MMPs, that is, changing only one substituent between two molecules, can be predicted with higher accuracy and precision in contrast to any other chemical compound pair. It is expected that any model should be able to predict such a defined change with high accuracy and reasonable precision. In this study, we examine the predictability of four classical properties relevant for drug design ranging from simple physicochemical parameters (log D and solubility) to more complex cell-based ones (permeability and clearance), using different data sets and machine learning algorithms. Our study confirms that additive data are the easiest to predict, which highlights the importance of recognition of nonadditivity events and the challenging complexity of predicting properties in case of scaffold hopping. Despite deep learning being well suited to model nonlinear events, these methods do not seem to be an exception of this observation. Though they are in general performing better than classical machine learning methods, this leaves the field with a still standing challenge.
RESUMO
Peptides are an important modality in drug discovery. While current peptide optimization focuses predominantly on the small number of natural and commercially available non-natural amino acids, the chemical spaces available for small molecule drug discovery are in the billions of molecules. In the present study, we describe the development of a large virtual library of readily synthesizable non-natural amino acids that can power the virtual screening protocols and aid in peptide optimization. To that end, we enumerated nearly 380 thousand amino acids and demonstrated their vast chemical diversity compared to the 20 natural and commercial residues. Furthermore, we selected a diverse ten thousand amino acid subset to validate our virtual screening workflow on the Keap1-Neh2 complex model system. Through in silico mutations of Neh2 peptide residues to those from the virtual library, our docking-based protocol identified a number of possible solutions with a significantly higher predicted affinity toward the Keap1 protein. This protocol demonstrates that the non-natural amino acid chemical space can be massively extended and virtually screened with a reasonable computational cost.
Assuntos
Aminoácidos , Fator 2 Relacionado a NF-E2 , Aminoácidos/química , Descoberta de Drogas/métodos , Proteína 1 Associada a ECH Semelhante a Kelch , Simulação de Acoplamento Molecular , Peptídeos/químicaRESUMO
Computer aided synthesis planning, suggesting synthetic routes for molecules of interest, is a rapidly growing field. The machine learning methods used are often dependent on access to large datasets for training, but finite experimental budgets limit how much data can be obtained from experiments. This suggests the use of schemes for data collection such as active learning, which identifies the data points of highest impact for model accuracy, and which has been used in recent studies with success. However, little has been done to explore the robustness of the methods predicting reaction yield when used together with active learning to reduce the amount of experimental data needed for training. This study aims to investigate the influence of machine learning algorithms and the number of initial data points on reaction yield prediction for two public high-throughput experimentation datasets. Our results show that active learning based on output margin reached a pre-defined AUROC faster than random sampling on both datasets. Analysis of feature importance of the trained machine learning models suggests active learning had a larger influence on the model accuracy when only a few features were important for the model prediction.
Assuntos
Aprendizado de MáquinaRESUMO
Proteins exist in several different conformations. These structural changes are often associated with fluctuations at the residue level. Recent findings show that co-evolutionary analysis coupled with machine-learning techniques improves the precision by providing quantitative distance predictions between pairs of residues. The predicted statistical distance distribution from Multi Sequence Analysis reveals the presence of different local maxima suggesting the flexibility of key residue pairs. Here we investigate the ability of the residue-residue distance prediction to provide insights into the protein conformational ensemble. We combine deep learning approaches with mechanistic modeling to a set of proteins that experimentally showed conformational changes. The predicted protein models were filtered based on energy scores, RMSD clustering, and the centroids selected as the lowest energy structure per cluster. These models were compared to the experimental-Molecular Dynamics (MD) relaxed structure by analyzing the backbone residue torsional distribution and the sidechain orientations. Our pipeline allows to retrieve the experimental structural dynamics experimentally represented by different X-ray conformations for the same sequence as well the conformational space observed with the MD simulations. We show the potential correlation between the experimental structure dynamics and the predicted model ensemble demonstrating the susceptibility of the current state-of-the-art methods in protein folding and dynamics prediction and pointing out the areas of improvement.
Assuntos
Simulação de Dinâmica Molecular , Proteínas , Aprendizado de Máquina , Conformação Proteica , Dobramento de Proteína , Proteínas/químicaRESUMO
Contrary to expectation N-aryl pyrrolidinones (and isosteric imidazolinones and oxazolinones) are more lipophilic and less soluble than the corresponding piperidinones (tetrahydropyrimidinones and oxazinones). Exploration of the basis for these results uncovered a subtle interplay of steric and electronic effects that result in different conformations for the two classes of compounds which drive the observed effects.
Assuntos
Pirrolidinonas , Conformação MolecularRESUMO
Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.
RESUMO
The conformational behavior of a small molecule free in solution is important to understand the free energy of binding to its target. This could be of special interest for proteolysis-targeting chimeras (PROTACs) due to their often flexible and lengthy linkers and the need to induce a ternary complex. Here, we report on the molecular dynamics (MD) simulations of two PROTACs, MZ1 and dBET6, revealing different linker conformational behaviors. The simulation of MZ1 in dimethyl sulfoxide (DMSO) agrees well with the nuclear magnetic resonance study, providing strong support for the relevance of our simulations. To further understand the role of linker plasticity in the formation of a ternary complex, the dissociation of the complex von Hippel-Lindau-MZ1-BRD4 is investigated in detail by steered simulations and is shown to follow a two-step pathway. Interestingly, both MZ1 and dBET6 display in water, a tendency toward an intramolecular lipophilic interaction between the two warheads. The hydrophobic contact of the two warheads would prevent them from binding to their respective proteins and might have an effect on the efficacy of induced cellular protein degradation. However, conformations featuring this hydrophobic contact of the two warheads are calculated to be marginally more favorable.
Assuntos
Proteínas Nucleares , Ubiquitina-Proteína Ligases , Proteínas Nucleares/metabolismo , Proteólise , Fatores de Transcrição/metabolismo , Ubiquitina-Proteína Ligases/química , Ubiquitina-Proteína Ligases/metabolismoRESUMO
Aromatic and heteroaromatic amines (ArNH2) are activated by cytochrome P450 monooxygenases, primarily CYP1A2, into reactive N-arylhydroxylamines that can lead to covalent adducts with DNA nucleobases. Hereby, we give hands-on mechanism-based guidelines to design mutagenicity-free ArNH2. The mechanism of N-hydroxylation of ArNH2 by CYP1A2 is investigated by density functional theory (DFT) calculations. Two putative pathways are considered, the radicaloid route that goes via the classical ferryl-oxo oxidant and an alternative anionic pathway through Fenton-like oxidation by ferriheme-bound H2O2. Results suggest that bioactivation of ArNH2 follows the anionic pathway. We demonstrate that H-bonding and/or geometric fit of ArNH2 to CYP1A2 as well as feasibility of both proton abstraction by the ferriheme-peroxo base and heterolytic cleavage of arylhydroxylamines render molecules mutagenic. Mutagenicity of ArNH2 can be removed by structural alterations that disrupt geometric and/or electrostatic fit to CYP1A2, decrease the acidity of the NH2 group, destabilize arylnitrenium ions, or disrupt their pre-covalent transition states with guanine.
Assuntos
Aminas/metabolismo , Citocromo P-450 CYP1A2/metabolismo , Compostos Heterocíclicos/metabolismo , Hidrocarbonetos Aromáticos/metabolismo , Mutagênicos/metabolismo , Aminas/química , Domínio Catalítico , Cristalografia por Raios X , Citocromo P-450 CYP1A2/química , Teoria da Densidade Funcional , Análise Discriminante , Compostos Heterocíclicos/química , Humanos , Hidrocarbonetos Aromáticos/química , Hidroxilação , Análise dos Mínimos Quadrados , Modelos Químicos , Estrutura Molecular , Mutagênicos/química , Ligação ProteicaRESUMO
The glucocorticoid receptor (GR) is a nuclear receptor that controls critical biological processes by regulating the transcription of specific genes. There is a known allosteric cross-talk between the ligand and coregulator binding sites within the GR ligand-binding domain that is crucial for the control of the functional response. However, the molecular mechanisms underlying such an allosteric control remain elusive. Here, molecular dynamics (MD) simulations, bioinformatic analysis, and biophysical measurements are integrated to capture the structural and dynamic features of the allosteric cross-talk within the GR. We identified a network of evolutionarily conserved residues that enables the allosteric signal transduction, in agreement with experimental data. MD simulations clarify how such a network is dynamically interconnected and offer a mechanistic explanation of how different peptides affect the intensity of the allosteric signal. This study provides useful insights to elucidate the GR allosteric regulation, ultimately providing a foundation for designing novel drugs.
Assuntos
Peptídeos , Receptores de Glucocorticoides , Regulação Alostérica , Sítio Alostérico , Sítios de Ligação , Humanos , Ligantes , Ligação Proteica , Receptores de Glucocorticoides/metabolismoRESUMO
Starting from our previously described PI3Kγ inhibitors, we describe the exploration of structure-activity relationships that led to the discovery of highly potent dual PI3Kγδ inhibitors. We explored changes in two positions of the molecules, including macrocyclization, but ultimately identified a simpler series with the desired potency profile that had suitable physicochemical properties for inhalation. We were able to demonstrate efficacy in a rat ovalbumin challenge model of allergic asthma and in cells derived from asthmatic patients. The optimized compound, AZD8154, has a long duration of action in the lung and low systemic exposure coupled with high selectivity against off-targets.
Assuntos
Asma/tratamento farmacológico , Classe Ib de Fosfatidilinositol 3-Quinase/metabolismo , Inibidores de Proteínas Quinases/uso terapêutico , Sulfonamidas/uso terapêutico , Tiazóis/uso terapêutico , Animais , Asma/induzido quimicamente , Classe I de Fosfatidilinositol 3-Quinases/metabolismo , Cristalografia por Raios X , Humanos , Leucócitos Mononucleares/efeitos dos fármacos , Masculino , Estrutura Molecular , Ovalbumina , Fosfatidilinositol 3-Quinases/metabolismo , Ligação Proteica , Inibidores de Proteínas Quinases/síntese química , Inibidores de Proteínas Quinases/metabolismo , Inibidores de Proteínas Quinases/farmacocinética , Ratos Endogâmicos BN , Relação Estrutura-Atividade , Sulfonamidas/síntese química , Sulfonamidas/metabolismo , Sulfonamidas/farmacocinética , Tiazóis/síntese química , Tiazóis/metabolismo , Tiazóis/farmacocinéticaRESUMO
Activity prediction plays an essential role in drug discovery by directing search of drug candidates in the relevant chemical space. Despite being applied successfully to image recognition and semantic similarity, the Siamese neural network has rarely been explored in drug discovery where modelling faces challenges such as insufficient data and class imbalance. Here, we present a Siamese recurrent neural network model (SiameseCHEM) based on bidirectional long short-term memory architecture with a self-attention mechanism, which can automatically learn discriminative features from the SMILES representations of small molecules. Subsequently, it is used to categorize bioactivity of small molecules via N-shot learning. Trained on random SMILES strings, it proves robust across five different datasets for the task of binary or categorical classification of bioactivity. Benchmarking against two baseline machine learning models which use the chemistry-rich ECFP fingerprints as the input, the deep learning model outperforms on three datasets and achieves comparable performance on the other two. The failure of both baseline methods on SMILES strings highlights that the deep learning model may learn task-specific chemistry features encoded in SMILES strings.
RESUMO
A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.
RESUMO
Lead generation for difficult-to-drug targets that have large, featureless, and highly lipophilic or highly polar and/or flexible binding sites is highly challenging. Here, we describe how cores of macrocyclic natural products can serve as a high-quality in silico screening library that provides leads for difficult-to-drug targets. Two iterative rounds of docking of a carefully selected set of natural-product-derived cores led to the discovery of an uncharged macrocyclic inhibitor of the Keap1-Nrf2 protein-protein interaction, a particularly challenging target due to its highly polar binding site. The inhibitor displays cellular efficacy and is well-positioned for further optimization based on the structure of its complex with Keap1 and synthetic access. We believe that our work will spur interest in using macrocyclic cores for in silico-based lead generation and also inspire the design of future macrocycle screening collections.
Assuntos
Produtos Biológicos/química , Compostos Policíclicos/síntese química , Compostos Policíclicos/farmacologia , Simulação por Computador , Mineração de Dados , Bases de Dados Factuais , Descoberta de Drogas , Avaliação Pré-Clínica de Medicamentos , Humanos , Proteína 1 Associada a ECH Semelhante a Kelch/antagonistas & inibidores , Proteína 1 Associada a ECH Semelhante a Kelch/química , Microssomos Hepáticos , Modelos Moleculares , Simulação de Acoplamento Molecular , Fator 2 Relacionado a NF-E2 , Compostos Policíclicos/química , Solubilidade , Relação Estrutura-AtividadeRESUMO
In the past few years, we have witnessed a renaissance of the field of molecular de novo drug design. The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chemical compounds by using either graph- or string (SMILES)-based representations. With this application note, we aim to offer the community a production-ready tool for de novo design, called REINVENT. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chemical space. It can facilitate the idea generation process by bringing to the researcher's attention the most promising compounds. REINVENT's code is publicly available at https://github.com/MolecularAI/Reinvent.