RESUMO
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Assuntos
Inteligência Artificial , Aprendizagem Baseada em Problemas , Reprodutibilidade dos Testes , Software , Descoberta de DrogasRESUMO
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.
RESUMO
Many biological processes are regulated by allosteric mechanisms that communicate with distant sites in the protein responsible for functionality. The binding of a small molecule at an allosteric site typically induces conformational changes that propagate through the protein along allosteric pathways regulating enzymatic activity. Elucidating those communication pathways from allosteric sites to orthosteric sites is, therefore, essential to gain insights into biochemical processes. Targeting the allosteric pathways by mutagenesis can allow the engineering of proteins with desired functions. Furthermore, binding small molecule modulators along the allosteric pathways is a viable approach to target reactions using allosteric inhibitors/activators with temporal and spatial selectivity. Methods based on network theory can elucidate protein communication networks through the analysis of pairwise correlations observed in molecular dynamics (MD) simulations using molecular descriptors that serve as proxies for allosteric information. Typically, single atomic descriptors such as α-carbon displacements are used as proxies for allosteric information. Therefore, allosteric networks are based on correlations revealed by that descriptor. Here, we introduce a Python software package that provides a comprehensive toolkit for studying allostery from MD simulations of biochemical systems. MDiGest offers the ability to describe protein dynamics by combining different approaches, such as correlations of atomic displacements or dihedral angles, as well as a novel approach based on the correlation of Kabsch-Sander electrostatic couplings. MDiGest allows for comparisons of networks and community structures that capture physical information relevant to allostery. Multiple complementary tools for studying essential dynamics include principal component analysis, root mean square fluctuation, as well as secondary structure-based analyses.
Assuntos
Simulação de Dinâmica Molecular , Proteínas , Regulação Alostérica , Proteínas/química , Sítio AlostéricoRESUMO
Allosteric drugs have the potential to revolutionize biomedicine due to their enhanced selectivity and protection against overdosage. However, we need to better understand allosteric mechanisms in order to fully harness their potential in drug discovery. In this study, molecular dynamics simulations and nuclear magnetic resonance spectroscopy are used to investigate how increases in temperature affect allostery in imidazole glycerol phosphate synthase. Results demonstrate that temperature increase triggers a cascade of local amino acid-to-amino acid dynamics that remarkably resembles the allosteric activation that takes place upon effector binding. The differences in the allosteric response elicited by temperature increase as opposed to effector binding are conditional to the alterations of collective motions induced by either mode of activation. This work provides an atomistic picture of temperature-dependent allostery, which could be harnessed to more precisely control enzyme function.
Assuntos
Glicerol , Simulação de Dinâmica Molecular , Sítio Alostérico , Regulação Alostérica , Aminoácidos , Imidazóis/química , FosfatosRESUMO
Applying deep learning concepts from image detection and graph theory has greatly advanced protein-ligand binding affinity prediction, a challenge with enormous ramifications for both drug discovery and protein engineering. We build upon these advances by designing a novel deep learning architecture consisting of a 3-dimensional convolutional neural network utilizing channel-wise attention and two graph convolutional networks utilizing attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based Convolutional Neural Network) obtains state-of-the-art results on the PDBbind v.2016 core set, the most widely recognized benchmark in the field. We extensively assess the generalizability of our model using multiple train-test splits, each of which maximizes differences between either protein structures, protein sequences, or ligand extended-connectivity fingerprints of complexes in the training and test sets. Furthermore, we perform 10-fold cross-validation with a similarity cutoff between SMILES strings of ligands in the training and test sets and also evaluate the performance of HAC-Net on lower-quality data. We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction. All of our software is available as an open-source repository at https://github.com/gregory-kyro/HAC-Net/, and the HACNet Python package is available through PyPI.
Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Proteínas/química , Ligação Proteica , SoftwareRESUMO
Artificial photosynthesis is an attractive strategy for converting solar energy into fuels, largely because the Earth receives enough solar energy in one hour to meet humanity's energy needs for an entire year. However, developing devices for artificial photosynthesis remains difficult and requires computational approaches to guide and assist the interpretation of experiments. In this Perspective, we discuss current and future computational approaches, as well as the challenges of designing and characterizing molecular assemblies that absorb solar light, transfer electrons between interfaces, and catalyze water-splitting and fuel-forming reactions.
Assuntos
Fotossíntese , Energia Solar , Luz Solar , Transporte de Elétrons , ÁguaRESUMO
Many bacteria possess type-II immunity against invading phages or plasmids known as the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated 9 (Cas9) system to detect and degrade the foreign DNA sequences. The Cas9 protein has two endonucleases responsible for double-strand breaks (the HNH domain for cleaving the target strand of DNA duplexes and RuvC domain for the nontarget strand, respectively) and a single-guide RNA-binding domain where the RNA and target DNA strands are base-paired. Three engineered single Lys-to-Ala HNH mutants (K810A, K848A, and K855A) exhibit an enhanced substrate specificity for cleavage of the target DNA strand. We report in this study that in the wild-type (wt) enzyme, D835, Y836, and D837 within the Y836-containing loop (comprising E827-D837) adjacent to the catalytic site have uncharacterizable broadened 1H15N nuclear magnetic resonance (NMR) features, whereas remaining residues in the loop have different extents of broadened NMR spectra. We find that this loop in the wt enzyme exhibits three distinct conformations over the duration of the molecular dynamics simulations, whereas the three Lys-to-Ala mutants retain only one conformation. The versatility of multiple alternate conformations of this loop in the wt enzyme could help to recruit noncognate DNA substrates into the HNH active site for cleavage, thereby reducing its substrate specificity relative to the three mutants. Our study provides further experimental and computational evidence that Lys-to-Ala substitutions reduce dynamics of proteins and thus increase their stability.
Assuntos
Sistemas CRISPR-Cas , Endonucleases , Proteína 9 Associada à CRISPR/genética , Sistemas CRISPR-Cas/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , DNA/química , DNA/genética , Endonucleases/químicaRESUMO
The CRISPR-associated protein 9 (Cas9) has been engineered as a precise gene editing tool to make double-strand breaks. CRISPR-associated protein 9 binds the folded guide RNA (gRNA) that serves as a binding scaffold to guide it to the target DNA duplex via a RecA-like strand-displacement mechanism but without ATP binding or hydrolysis. The target search begins with the protospacer adjacent motif or PAM-interacting domain, recognizing it at the major groove of the duplex and melting its downstream duplex where an RNA-DNA heteroduplex is formed at nanomolar affinity. The rate-limiting step is the formation of an R-loop structure where the HNH domain inserts between the target heteroduplex and the displaced non-target DNA strand. Once the R-loop structure is formed, the non-target strand is rapidly cleaved by RuvC and ejected from the active site. This event is immediately followed by cleavage of the target DNA strand by the HNH domain and product release. Within CRISPR-associated protein 9, the HNH domain is inserted into the RuvC domain near the RuvC active site via two linker loops that provide allosteric communication between the two active sites. Due to the high flexibility of these loops and active sites, biophysical techniques have been instrumental in characterizing the dynamics and mechanism of the CRISPR-associated protein 9 nucleases, aiding structural studies in the visualization of the complete active sites and relevant linker structures. Here, we review biochemical, structural, and biophysical studies on the underlying mechanism with emphasis on how CRISPR-associated protein 9 selects the target DNA duplex and rejects non-target sequences.