Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Chem Inf Model ; 63(24): 7689-7698, 2023 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-38055952

RESUMO

Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and material discovery. Self-supervised pretraining of transformer models requires large-scale data sets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incur extra computational costs. In contrast, large-scale open-source data sets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieves comparable accuracy to those trained on augmented polymer data sets for a series of benchmark prediction tasks.


Assuntos
Benchmarking , Desenvolvimento de Medicamentos , Fontes de Energia Elétrica , Idioma , Polímeros
2.
Proc Natl Acad Sci U S A ; 117(39): 24258-24268, 2020 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-32913056

RESUMO

The small GTPase KRAS is localized at the plasma membrane where it functions as a molecular switch, coupling extracellular growth factor stimulation to intracellular signaling networks. In this process, KRAS recruits effectors, such as RAF kinase, to the plasma membrane where they are activated by a series of complex molecular steps. Defining the membrane-bound state of KRAS is fundamental to understanding the activation of RAF kinase and in evaluating novel therapeutic opportunities for the inhibition of oncogenic KRAS-mediated signaling. We combined multiple biophysical measurements and computational methodologies to generate a consensus model for authentically processed, membrane-anchored KRAS. In contrast to the two membrane-proximal conformations previously reported, we identify a third significantly populated state using a combination of neutron reflectivity, fast photochemical oxidation of proteins (FPOP), and NMR. In this highly populated state, which we refer to as "membrane-distal" and estimate to comprise ∼90% of the ensemble, the G-domain does not directly contact the membrane but is tethered via its C-terminal hypervariable region and carboxymethylated farnesyl moiety, as shown by FPOP. Subsequent interaction of the RAF1 RAS binding domain with KRAS does not significantly change G-domain configurations on the membrane but affects their relative populations. Overall, our results are consistent with a directional fly-casting mechanism for KRAS, in which the membrane-distal state of the G-domain can effectively recruit RAF kinase from the cytoplasm for activation at the membrane.


Assuntos
Proteínas Proto-Oncogênicas p21(ras)/metabolismo , Quinases raf/metabolismo , Membrana Celular/metabolismo , Simulação de Dinâmica Molecular
3.
Proc Natl Acad Sci U S A ; 116(11): 5086-5095, 2019 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-30808805

RESUMO

The lysosomal enzyme glucocerebrosidase-1 (GCase) catalyzes the cleavage of a major glycolipid glucosylceramide into glucose and ceramide. The absence of fully functional GCase leads to the accumulation of its lipid substrates in lysosomes, causing Gaucher disease, an autosomal recessive disorder that displays profound genotype-phenotype nonconcordance. More than 250 disease-causing mutations in GBA1, the gene encoding GCase, have been discovered, although only one of these, N370S, causes 70% of disease. Here, we have used a knowledge-based docking protocol that considers experimental data of protein-protein binding to generate a complex between GCase and its known facilitator protein saposin C (SAPC). Multiscale molecular-dynamics simulations were used to study lipid self-assembly, membrane insertion, and the dynamics of the interactions between different components of the complex. Deep learning was applied to propose a model that explains the mechanism of GCase activation, which requires SAPC. Notably, we find that conformational changes in the loops at the entrance of the substrate-binding site are stabilized by direct interactions with SAPC and that the loss of such interactions induced by N370S and another common mutation, L444P, result in destabilization of the complex and reduced GCase activation. Our findings provide an atomistic-level explanation for GCase activation and the precise mechanism through which N370S and L444P cause Gaucher disease.


Assuntos
Aprendizado Profundo , Doença de Gaucher/enzimologia , Doença de Gaucher/fisiopatologia , Glucosilceramidase/metabolismo , Simulação de Dinâmica Molecular , Domínio Catalítico , Ativação Enzimática , Glucosilceramidase/química , Humanos , Ligação de Hidrogênio , Proteínas Mutantes/química , Mapas de Interação de Proteínas , Estrutura Secundária de Proteína , Saposinas/metabolismo
4.
Int J High Perform Comput Appl ; 36(5-6): 587-602, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38603308

RESUMO

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

5.
J Biol Chem ; 295(4): 1105-1119, 2020 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-31836666

RESUMO

Neurofibromin is a tumor suppressor encoded by the NF1 gene, which is mutated in Rasopathy disease neurofibromatosis type I. Defects in NF1 lead to aberrant signaling through the RAS-mitogen-activated protein kinase pathway due to disruption of the neurofibromin GTPase-activating function on RAS family small GTPases. Very little is known about the function of most of the neurofibromin protein; to date, biochemical and structural data exist only for its GAP domain and a region containing a Sec-PH motif. To better understand the role of this large protein, here we carried out a series of biochemical and biophysical experiments, including size-exclusion chromatography-multiangle light scattering (SEC-MALS), small-angle X-ray and neutron scattering, and analytical ultracentrifugation, indicating that full-length neurofibromin forms a high-affinity dimer. We observed that neurofibromin dimerization also occurs in human cells and likely has biological and clinical implications. Analysis of purified full-length and truncated neurofibromin variants by negative-stain EM revealed the overall architecture of the dimer and predicted the potential interactions that contribute to the dimer interface. We could reconstitute structures resembling high-affinity full-length dimers by mixing N- and C-terminal protein domains in vitro The reconstituted neurofibromin was capable of GTPase activation in vitro, and co-expression of the two domains in human cells effectively recapitulated the activity of full-length neurofibromin. Taken together, these results suggest how neurofibromin dimers might form and be stabilized within the cell.


Assuntos
Neurofibromina 1/química , Neurofibromina 1/metabolismo , Multimerização Proteica , Células HEK293 , Humanos , Neurofibromina 1/ultraestrutura , Domínios Proteicos , Relação Estrutura-Atividade , Proteínas Ativadoras de ras GTPase/metabolismo
6.
J Chem Inf Model ; 61(6): 3058-3073, 2021 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-34124899

RESUMO

ß-coronavirus (CoVs) alone has been responsible for three major global outbreaks in the 21st century. The current crisis has led to an urgent requirement to develop therapeutics. Even though a number of vaccines are available, alternative strategies targeting essential viral components are required as a backup against the emergence of lethal viral variants. One such target is the main protease (Mpro) that plays an indispensable role in viral replication. The availability of over 270 Mpro X-ray structures in complex with inhibitors provides unique insights into ligand-protein interactions. Herein, we provide a comprehensive comparison of all nonredundant ligand-binding sites available for SARS-CoV2, SARS-CoV, and MERS-CoV Mpro. Extensive adaptive sampling has been used to investigate structural conservation of ligand-binding sites using Markov state models (MSMs) and compare conformational dynamics employing convolutional variational auto-encoder-based deep learning. Our results indicate that not all ligand-binding sites are dynamically conserved despite high sequence and structural conservation across ß-CoV homologs. This highlights the complexity in targeting all three Mpro enzymes with a single pan inhibitor.


Assuntos
COVID-19 , Peptídeo Hidrolases , Antivirais , Sítios de Ligação , Humanos , Ligantes , Inibidores de Proteases , RNA Viral , SARS-CoV-2
7.
Biochem J ; 477(15): 2791-2805, 2020 08 14.
Artigo em Inglês | MEDLINE | ID: mdl-32657326

RESUMO

Glycosylation of secondary metabolites involves plant UDP-dependent glycosyltransferases (UGTs). UGTs have shown promise as catalysts in the synthesis of glycosides for medical treatment. However, limited understanding at the molecular level due to insufficient biochemical and structural information has hindered potential applications of most of these UGTs. In the absence of experimental crystal structures, we employed advanced molecular modeling and simulations in conjunction with biochemical characterization to design a workflow to study five Group H Arabidopsis thaliana (76E1, 76E2, 76E4, 76E5, 76D1) UGTs. Based on our rational structural manipulation and analysis, we identified key amino acids (P129 in 76D1; D374 in 76E2; K275 in 76E4), which when mutated improved donor substrate recognition than wildtype UGTs. Molecular dynamics simulations and deep learning analysis identified structural differences, which drive substrate preferences. The design of these UGTs with broader substrate specificity may play important role in biotechnological and industrial applications. These findings can also serve as basis to study other plant UGTs and thereby advancing UGT enzyme engineering.


Assuntos
Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Glicosiltransferases/química , Glicosiltransferases/metabolismo , Engenharia de Proteínas/métodos , Proteínas de Arabidopsis/genética , Aprendizado Profundo , Glucosiltransferases/química , Glucosiltransferases/genética , Glucosiltransferases/metabolismo , Glicosiltransferases/genética , Modelos Moleculares , Simulação de Dinâmica Molecular , Mutagênese Sítio-Dirigida , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Homologia Estrutural de Proteína , Relação Estrutura-Atividade , Especificidade por Substrato
8.
BMC Bioinformatics ; 19(Suppl 18): 484, 2018 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-30577777

RESUMO

BACKGROUND: We examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes. RESULTS: We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 µs aggregate sampling), villin head piece (single trajectory of 125 µs) and ß- ß- α (BBA) protein (223 + 102 µs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features. CONCLUSIONS: Together, we show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.


Assuntos
Dobramento de Proteína , Análise por Conglomerados , Simulação de Dinâmica Molecular
9.
Proc Natl Acad Sci U S A ; 112(45): 13886-91, 2015 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-26504206

RESUMO

Inorganic pyrophosphatase (IPPase) from Thermococcus thioreducens is a large oligomeric protein derived from a hyperthermophilic microorganism that is found near hydrothermal vents deep under the sea, where the pressure is up to 100 MPa (1 kbar). It has attracted great interest in biophysical research because of its high activity under extreme conditions in the seabed. In this study, we use the quasielastic neutron scattering (QENS) technique to investigate the effects of pressure on the conformational flexibility and relaxation dynamics of IPPase over a wide temperature range. The ß-relaxation dynamics of proteins was studied in the time ranges from 2 to 25 ps, and from 100 ps to 2 ns, using two spectrometers. Our results indicate that, under a pressure of 100 MPa, close to that of the native environment deep under the sea, IPPase displays much faster relaxation dynamics than a mesophilic model protein, hen egg white lysozyme (HEWL), at all measured temperatures, opposite to what we observed previously under ambient pressure. This contradictory observation provides evidence that the protein energy landscape is distorted by high pressure, which is significantly different for hyperthermophilic (IPPase) and mesophilic (HEWL) proteins. We further derive from our observations a schematic denaturation phase diagram together with energy landscapes for the two very different proteins, which can be used as a general picture to understand the dynamical properties of thermophilic proteins under pressure.


Assuntos
Proteínas Arqueais/química , Biopolímeros/química , Biologia Marinha , Pressão , Thermococcus/enzimologia
10.
Phys Chem Chem Phys ; 16(26): 13447-57, 2014 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-24887596

RESUMO

Molecular simulations have allowed us to probe the atomic details of aqueous solutions of tetramethylammonium (TMA) and tetrabutylammonium (TBA) bromide, across a wide range of concentrations (0.5 to 3-4 molal). We highlight the space-filling (TMA(+)) versus penetrable (TBA(+)) nature of these polyatomic cations and its consequence for ion hydration, ion dynamics and ion-ion interactions. A well-established hydration is seen for both TMA(+) and TBA(+) throughout the concentration range studied. A clear penetration of water molecules, as well as counterions, between the hydrocarbon arms of TBA(+), which remain in an extended configuration, is seen. Global rotation of individual TBA(+) points towards isolated rather than aggregated ions (from dilute up to 1 m concentration). Only for highly concentrated solutions, in which inter-penetration of adjacent TBA(+)s cannot be avoided, does the rotational time increase dramatically. From both structural and dynamic data we conclude that there is absence of hydrophobicity-driven cation-cation aggregation in both TMABr and TBABr solutions studied. The link between these real systems and the theoretical predictions for spherical hydrophobic solutes of varying size does not seem straightforward.

11.
Patterns (N Y) ; 5(4): 100947, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38645768

RESUMO

This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.

12.
Protein Sci ; 32(10): e4772, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37646172

RESUMO

Characterizing structural ensembles of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) of proteins is essential for studying structure-function relationships. Due to the different neutron scattering lengths of hydrogen and deuterium, selective labeling and contrast matching in small-angle neutron scattering (SANS) becomes an effective tool to study dynamic structures of disordered systems. However, experimental timescales typically capture measurements averaged over multiple conformations, leaving complex SANS data for disentanglement. We hereby demonstrate an integrated method to elucidate the structural ensemble of a complex formed by two IDRs. We use data from both full contrast and contrast matching with residue-specific deuterium labeling SANS experiments, microsecond all-atom molecular dynamics (MD) simulations with four molecular mechanics force fields, and an autoencoder-based deep learning (DL) algorithm. From our combined approach, we show that selective deuteration provides additional information that helps characterize structural ensembles. We find that among the four force fields, a99SB-disp and CHARMM36m show the strongest agreement with SANS and NMR experiments. In addition, our DL algorithm not only complements conventional structural analysis methods but also successfully differentiates NMR and MD structures which are indistinguishable on the free energy surface. Lastly, we present an ensemble that describes experimental SANS and NMR data better than MD ensembles generated by one single force field and reveal three clusters of distinct conformations. Our results demonstrate a new integrated approach for characterizing structural ensembles of IDPs.

13.
Sci Rep ; 13(1): 20031, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37973879

RESUMO

The inverse design of novel molecules with a desirable optoelectronic property requires consideration of the vast chemical spaces associated with varying chemical composition and molecular size. First principles-based property predictions have become increasingly helpful for assisting the selection of promising candidate chemical species for subsequent experimental validation. However, a brute-force computational screening of the entire chemical space is decidedly impossible. To alleviate the computational burden and accelerate rational molecular design, we here present an iterative deep learning workflow that combines (i) the density-functional tight-binding method for dynamic generation of property training data, (ii) a graph convolutional neural network surrogate model for rapid and reliable predictions of chemical and physical properties, and (iii) a masked language model. As proof of principle, we employ our workflow in the iterative generation of novel molecules with a target energy gap between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO).

14.
J Cheminform ; 15(1): 59, 2023 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-37291633

RESUMO

The vast size of chemical space necessitates computational approaches to automate and accelerate the design of molecular sequences to guide experimental efforts for drug discovery. Genetic algorithms provide a useful framework to incrementally generate molecules by applying mutations to known chemical structures. Recently, masked language models have been applied to automate the mutation process by leveraging large compound libraries to learn commonly occurring chemical sequences (i.e., using tokenization) and predict rearrangements (i.e., using mask prediction). Here, we consider how language models can be adapted to improve molecule generation for different optimization tasks. We use two different generation strategies for comparison, fixed and adaptive. The fixed strategy uses a pre-trained model to generate mutations; the adaptive strategy trains the language model on each new generation of molecules selected for target properties during optimization. Our results show that the adaptive strategy allows the language model to more closely fit the distribution of molecules in the population. Therefore, for enhanced fitness optimization, we suggest the use of the fixed strategy during an initial phase followed by the use of the adaptive strategy. We demonstrate the impact of adaptive training by searching for molecules that optimize both heuristic metrics, drug-likeness and synthesizability, as well as predicted protein binding affinity from a surrogate model. Our results show that the adaptive strategy provides a significant improvement in fitness optimization compared to the fixed pre-trained model, empowering the application of language models to molecular design tasks.

15.
Elife ; 122023 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-36826989

RESUMO

ß-Lactam antibiotics are the most important and widely used antibacterial agents across the world. However, the widespread dissemination of ß-lactamases among pathogenic bacteria limits the efficacy of ß-lactam antibiotics. This has created a major public health crisis. The use of ß-lactamase inhibitors has proven useful in restoring the activity of ß-lactam antibiotics, yet, effective clinically approved inhibitors against class B metallo-ß-lactamases are not available. L1, a class B3 enzyme expressed by Stenotrophomonas maltophilia, is a significant contributor to the ß-lactam resistance displayed by this opportunistic pathogen. Structurally, L1 is a tetramer with two elongated loops, α3-ß7 and ß12-α5, present around the active site of each monomer. Residues in these two loops influence substrate/inhibitor binding. To study how the conformational changes of the elongated loops affect the active site in each monomer, enhanced sampling molecular dynamics simulations were performed, Markov State Models were built, and convolutional variational autoencoder-based deep learning was applied. The key identified residues (D150a, H151, P225, Y227, and R236) were mutated and the activity of the generated L1 variants was evaluated in cell-based experiments. The results demonstrate that there are extremely significant gating interactions between α3-ß7 and ß12-α5 loops. Taken together, the gating interactions with the conformational changes of the key residues play an important role in the structural remodeling of the active site. These observations offer insights into the potential for novel drug development exploiting these gating interactions.


Assuntos
Antibacterianos , beta-Lactamases , Domínio Catalítico , Antibacterianos/farmacologia , beta-Lactamases/metabolismo , Penicilinas
16.
Phys Chem Chem Phys ; 14(37): 12898-904, 2012 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-22899253

RESUMO

Aqueous solutions of ionenes with bromide and fluoride counterions have been investigated using small angle neutron scattering for the first time. Ionenes are a class of cationic polyelectrolytes based on quaternary ammonium atoms and, considering the very low solubility of their uncharged part (hydrocarbon chain), would be formally classified as hydrophobic. Ionenes present important structural differences over previously studied polyelectrolytes: (a) charge is located on the polyelectrolyte backbone, (b) the distance between charges is regular and tunable by synthesis, (c) hydrophobicity comes from methylene groups of the backbone and not from bulky side groups. Results for Br ionenes feature a disappearance of the well-known polyelectrolyte peak beyond a given monomer concentration. Below this concentration, the position of the peak depends on the chain charge density, f(chem), and scales as f(chem)(0.30±0.04). This is an indication of a hydrophilic character of the ionene backbone. In addition, osmotic coefficients of ionene solutions resemble again other hydrophilic polyelectrolytes, featuring no unusual increase in the water activity (or a significant counterion condensation). We conclude that despite the hydrophobicity of the hydrocarbon chain separating charged centers on ionenes, these chains behave as hydrophilic. In contrast to Br ionenes, the polyelectrolyte peak remains at all concentrations studied for the single F ionene investigated. This strong counterion effect is rationalized in terms of the different hydrating properties and ion pairing in the case of bromide and fluoride ions.

17.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 7112-7127, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34232869

RESUMO

Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive models (Transformer-XL, XLNet) and four auto-encoder models (BERT, Albert, Electra, T5) on data from UniRef and BFD containing up to 393 billion amino acids. The protein LMs (pLMs) were trained on the Summit supercomputer using 5616 GPUs and TPU Pod up-to 1024 cores. Dimensionality reduction revealed that the raw pLM-embeddings from unlabeled data captured some biophysical features of protein sequences. We validated the advantage of using the embeddings as exclusive input for several subsequent tasks: (1) a per-residue (per-token) prediction of protein secondary structure (3-state accuracy Q3=81%-87%); (2) per-protein (pooling) predictions of protein sub-cellular location (ten-state accuracy: Q10=81%) and membrane versus water-soluble (2-state accuracy Q2=91%). For secondary structure, the most informative embeddings (ProtT5) for the first time outperformed the state-of-the-art without multiple sequence alignments (MSAs) or evolutionary information thereby bypassing expensive database searches. Taken together, the results implied that pLMs learned some of the grammar of the language of life. All our models are available through https://github.com/agemagician/ProtTrans.


Assuntos
Algoritmos , Processamento de Linguagem Natural , Biologia Computacional/métodos , Proteínas/química , Aprendizado de Máquina Supervisionado
18.
J Cheminform ; 13(1): 14, 2021 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-33622401

RESUMO

The process of drug discovery involves a search over the space of all possible chemical compounds. Generative Adversarial Networks (GANs) provide a valuable tool towards exploring chemical space and optimizing known compounds for a desired functionality. Standard approaches to training GANs, however, can result in mode collapse, in which the generator primarily produces samples closely related to a small subset of the training data. In contrast, the search for novel compounds necessitates exploration beyond the original data. Here, we present an approach to training GANs that promotes incremental exploration and limits the impacts of mode collapse using concepts from Genetic Algorithms. In our approach, valid samples from the generator are used to replace samples from the training data. We consider both random and guided selection along with recombination during replacement. By tracking the number of novel compounds produced during training, we show that updates to the training data drastically outperform the traditional approach, increasing potential applications for GANs in drug discovery.

19.
Front Mol Biosci ; 8: 710623, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34604302

RESUMO

Hemocyanin from horseshoe crab in its active form is a homo-hexameric protein. It exists in open and closed conformations when transitioning between deoxygenated and oxygenated states. Here, we present a detailed dynamic atomistic investigation of the oxygenated and deoxygenated states of the hexameric hemocyanin using explicit solvent molecular dynamics simulations. We focus on the variation in solvent cavities and the formation of tunnels in the two conformational states. By employing principal component analysis and CVAE-based deep learning, we are able to differentiate between the dynamics of the deoxy- and oxygenated states of hemocyanin. Finally, our results identify the deoxygenated open conformation, which adopts a stable, closed conformation after the oxygenation process.

20.
Front Microbiol ; 12: 720991, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34621251

RESUMO

Class A ß-lactamases are known for being able to rapidly gain broad spectrum catalytic efficiency against most ß-lactamase inhibitor combinations as a result of elusively minor point mutations. The evolution in class A ß-lactamases occurs through optimisation of their dynamic phenotypes at different timescales. At long-timescales, certain conformations are more catalytically permissive than others while at the short timescales, fine-grained optimisation of free energy barriers can improve efficiency in ligand processing by the active site. Free energy barriers, which define all coordinated movements, depend on the flexibility of the secondary structural elements. The most highly conserved residues in class A ß-lactamases are hydrophobic nodes that stabilize the core. To assess how the stable hydrophobic core is linked to the structural dynamics of the active site, we carried out adaptively sampled molecular dynamics (MD) simulations in four representative class A ß-lactamases (KPC-2, SME-1, TEM-1, and SHV-1). Using Markov State Models (MSM) and unsupervised deep learning, we show that the dynamics of the hydrophobic nodes is used as a metastable relay of kinetic information within the core and is coupled with the catalytically permissive conformation of the active site environment. Our results collectively demonstrate that the class A enzymes described here, share several important dynamic similarities and the hydrophobic nodes comprise of an informative set of dynamic variables in representative class A ß-lactamases.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA