RESUMO
In mammalian cells, two phosphatidylserine (PS) synthases drive PS synthesis. Gain-of-function mutations in the Ptdss1 gene lead to heightened PS production, causing Lenz-Majewski syndrome (LMS). Recently, pharmacological inhibition of PSS1 has been shown to suppress tumorigenesis. Here, we report the cryo-EM structures of wild-type human PSS1 (PSS1WT), the LMS-causing Pro269Ser mutant (PSS1P269S), and PSS1WT in complex with its inhibitor DS55980254. PSS1 contains 10 transmembrane helices (TMs), with TMs 4-8 forming a catalytic core in the luminal leaflet. These structures revealed a working mechanism of PSS1 akin to the postulated mechanisms of the membrane-bound O-acyltransferase family. Additionally, we showed that both PS and DS55980254 can allosterically inhibit PSS1 and that inhibition by DS55980254 activates the SREBP pathways, thus enhancing the expression of LDL receptors and increasing cellular LDL uptake. This work uncovers a mechanism of mammalian PS synthesis and suggests that selective PSS1 inhibitors have the potential to lower blood cholesterol levels.
Assuntos
Fosfatidilserinas , Humanos , Fosfatidilserinas/metabolismo , Microscopia Crioeletrônica , Lipoproteínas LDL/metabolismo , Receptores de LDL/metabolismo , CDPdiacilglicerol-Serina O-Fosfatidiltransferase/metabolismo , CDPdiacilglicerol-Serina O-Fosfatidiltransferase/genética , Animais , Células HEK293RESUMO
Directed evolution facilitates enzyme engineering via iterative rounds of mutagenesis. Despite the wide applications of high-throughput screening, building "smart libraries" to effectively identify beneficial variants remains a major challenge in the community. Here, we developed a new computational directed evolution protocol based on EnzyHTP, a software that we have previously reported to automate enzyme modeling. To enhance the throughput efficiency, we implemented an adaptive resource allocation strategy that dynamically allocates different types of computing resources (e.g., GPU/CPU) based on the specific need of an enzyme modeling subtask in the workflow. We implemented the strategy as a Python library and tested the library using fluoroacetate dehalogenase as a model enzyme. The results show that compared to fixed resource allocation where both CPU and GPU are on-call for use during the entire workflow, applying adaptive resource allocation can save 87% CPU hours and 14% GPU hours. Furthermore, we constructed a computational directed evolution protocol under the framework of adaptive resource allocation. The workflow was tested against two rounds of mutational screening in the directed evolution experiments of Kemp eliminase (KE07) with a total of 184 mutants. Using folding stability and electrostatic stabilization energy as computational readout, we identified all four experimentally observed target variants. Enabled by the workflow, the entire computation task (i.e., 18.4 µs MD and 18,400 QM single-point calculations) completes in 3 days of wall-clock time using â¼30 GPUs and â¼1000 CPUs.
Assuntos
Ensaios de Triagem em Larga Escala , Alocação de Recursos , Biblioteca Gênica , Mutagênese , MutaçãoRESUMO
Lasso peptides are a subclass of ribosomally synthesized and post-translationally modified peptides with a slipknot conformation. With superior thermal stability, protease resistance, and antimicrobial activity, lasso peptides are promising candidates for bioengineering and pharmaceutical applications. To enable high-throughput computational prediction and design of lasso peptides, we developed a software, LassoHTP, for automatic lasso peptide structure construction and modeling. LassoHTP consists of three modules, including the scaffold constructor, mutant generator, and molecular dynamics (MD) simulator. With a user-provided sequence and conformational annotation, LassoHTP can either generate the structure and conformational ensemble as is or conduct random mutagenesis. We used LassoHTP to construct eight known lasso peptide structures de novo and to simulate their conformational ensembles for 100 ns MD simulations. For benchmarking, we calculated the root mean square deviation (RMSD) of these ensembles with reference to their experimental crystal or NMR PDB structures; we also compared these RMSD values against those of the MD ensembles that are initiated from the PDB structures. Dihedral principal component analysis was also conducted. The results show that the LassoHTP-initiated ensembles are similar to those of the PDB-initiated ensembles. LassoHTP offers a computational platform to develop strategies for lasso peptide prediction and design.
Assuntos
Simulação de Dinâmica Molecular , Peptídeos , Peptídeos/química , Software , Conformação Molecular , Espectroscopia de Ressonância MagnéticaRESUMO
Molecular simulations, including quantum mechanics (QM), molecular mechanics (MM), and multiscale QM/MM modeling, have been extensively applied to understand the mechanism of enzyme catalysis and to design new enzymes. However, molecular simulations typically require specialized, manual operation ranging from model construction to data analysis to complete the entire life cycle of enzyme modeling. The dependence on manual operation makes it challenging to simulate enzymes and enzyme variants in a high-throughput fashion. In this work, we developed a Python software, EnzyHTP, to automate molecular model construction, QM, MM, and QM/MM computation, and analyses of modeling data for enzyme simulations. To test the EnzyHTP, we used fluoroacetate dehalogenase (FAcD) as a model system and simulated the enzyme interior electrostatics for 100 FAcD mutants with a random single amino acid substitution. For each enzyme mutant, the workflow involves structural model construction, 1 ns molecular dynamics (MD) simulations, and quantum mechanical calculations in 100 MD-sampled snapshots. The entire simulation workflow for 100 mutants was completed in 7 h with 10 GPUs and 160 CPUs. EnzyHTP improves the efficiency of computational enzyme modeling, setting a basis for high-throughput identification of function-enhancing enzymes and enzyme variants. The software is expected to facilitate the fundamental understanding of catalytic origins across enzyme families and to accelerate the optimization of biocatalysts for non-native substrates.
Assuntos
Simulação de Dinâmica Molecular , Teoria Quântica , Catálise , Humanos , Software , Eletricidade EstáticaRESUMO
The enzyme NgnD catalyzes an ambimodal cycloaddition that bifurcates to [6+4]- and [4+2]-adducts. Both products have been isolated in experiments, but it remains unknown how enzyme and water influence the bifurcation selectivity at the femtosecond time scale. Here, we study the impact of water and enzyme on the post-transition state bifurcation of NgnD-catalyzed [6+4]/[4+2] cycloaddition by integrating quantum mechanics/molecular mechanics quasiclassical dynamics simulations and biochemical assays. The ratio of [6+4]/[4+2] products significantly differs in the gas phase, water, and enzyme. Biochemical assays were employed to validate computational predictions. The study informs how water and enzyme affect the bifurcation selectivity through perturbation of the reaction dynamics in the femtosecond time scale, revealing the fundamental roles of condensed media in dynamically controlling the chemical selectivity for biosynthetic reactions.
Assuntos
Proteínas de Bactérias/química , Carbono-Carbono Liases/química , Água/química , Proteínas de Bactérias/metabolismo , Biocatálise , Carbono-Carbono Liases/metabolismo , Domínio Catalítico , Reação de Cicloadição , Teoria da Densidade Funcional , Lactonas/química , Lactonas/metabolismo , Modelos Químicos , Simulação de Dinâmica Molecular , Nocardia/enzimologia , Ligação ProteicaRESUMO
Vibrational spectroscopy, in particular infrared spectroscopy, has been widely used to probe the three-dimensional structures and conformational dynamics of nucleic acids. As commonly used chromophores, the C=O and C=C stretch modes in the nucleobases exhibit distinct spectral features for different base pairing and stacking configurations. To elucidate the origin of their structural sensitivity, in this work, we develop transition charge coupling (TCC) models that allow one to efficiently calculate the interactions or couplings between the C=O and C=C chromophores based on the geometric arrangements of the nucleobases. To evaluate their performances, we apply the TCC models to DNA and RNA oligonucleotides with a variety of secondary and tertiary structures and demonstrate that the predicted couplings are in quantitative agreement with the reference values. We further elucidate how the interactions between the paired and stacked bases give rise to characteristic IR absorption peaks and show that the TCC models provide more reliable predictions of the coupling constants as compared to the transition dipole coupling scheme. The TCC models, together with our recently developed through-bond coupling constants and vibrational frequency maps, provide an effective theoretical strategy to model the vibrational Hamiltonian, and hence the vibrational spectra of nucleic acids in the base carbonyl stretch region directly from atomistic molecular simulations.
Assuntos
DNA/química , Modelos Químicos , RNA/química , Conformação de Ácido Nucleico , Espectrofotometria Infravermelho , VibraçãoRESUMO
Omega-3 dietary supplements provide a rich source of the active moieties eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA), which exist in the form of triacylglycerols or ethyl esters. Infrared (IR) spectroscopy provides a rapid and quantitative tool to assess the quality of these products as specific normal modes, in particular the ester carbonyl stretch modes, exhibit characteristic spectral features for the two ester forms of omega-3 fatty acids. To uncover the origin of the observed spectra, in this work, we perform molecular dynamics simulations of EPA and DHA ethyl esters and triacylglycerols to characterize their conformation, packing, and dynamics in the liquid phase and use a mixed quantum/classical approach to calculate their IR absorption spectra in the ester carbonyl stretch region. We show that the ester liquids exhibit slow dynamics in spectral diffusion and translational and rotational motion, consistent with the diffusion ordered NMR spectroscopy measurements. We further demonstrate that the predicted IR spectra are in good agreement with experiments and reveal how a competition between intermolecular and intramolecular interactions gives rise to distinct absorption peaks for the fatty acid esters.
Assuntos
Ácidos Docosa-Hexaenoicos/química , Ácido Eicosapentaenoico/química , Ésteres/química , Simulação de Dinâmica Molecular , Espectrofotometria Infravermelho , Difusão , Conformação MolecularRESUMO
Hydrolase-catalyzed kinetic resolution is a well-established biocatalytic process. However, the computational tools that predict favorable enzyme scaffolds for separating a racemic substrate mixture are underdeveloped. To address this challenge, we trained a deep learning framework, EnzyKR, to automate the selection of hydrolases for stereoselective biocatalysis. EnzyKR adopts a classifier-regressor architecture that first identifies the reactive binding conformer of a substrate-hydrolase complex, and then predicts its activation free energy. A structure-based encoding strategy was used to depict the chiral interactions between hydrolases and enantiomers. Different from existing models trained on protein sequences and substrate SMILES strings, EnzyKR was trained using 204 substrate-hydrolase complexes, which were constructed by docking. EnzyKR was tested using a held-out dataset of 20 complexes on the task of predicting activation free energy. EnzyKR achieved a Pearson correlation coefficient (R) of 0.72, a Spearman rank correlation coefficient (Spearman R) of 0.72, and a mean absolute error (MAE) of 1.54 kcal mol-1 in this task. Furthermore, EnzyKR was tested on the task of predicting enantiomeric excess ratios for 28 hydrolytic kinetic resolution reactions catalyzed by fluoroacetate dehalogenase RPA1163, halohydrin HheC, A. mediolanus epoxide hydrolase, and P. fluorescens esterase. The performance of EnzyKR was compared against that of a recently developed kinetic predictor, DLKcat. EnzyKR correctly predicts the favored enantiomer and outperforms DLKcat in 18 out of 28 reactions, occupying 64% of the test cases. These results demonstrate EnzyKR to be a new approach for prediction of enantiomeric outcomes in hydrolase-catalyzed kinetic resolution reactions.
RESUMO
Identifying function-enhancing enzyme variants is a 'holy grail' challenge in protein science because it will allow researchers to expand the biocatalytic toolbox for late-stage functionalization of drug-like molecules, environmental degradation of plastics and other pollutants, and medical treatment of food allergies. Data-driven strategies, including statistical modeling, machine learning, and deep learning, have largely advanced the understanding of the sequence-structure-function relationships for enzymes. They have also enhanced the capability of predicting and designing new enzymes and enzyme variants for catalyzing the transformation of new-to-nature reactions. Here, we reviewed the recent progresses of data-driven models that were applied in identifying efficiency-enhancing mutants for catalytic reactions. We also discussed existing challenges and obstacles faced by the community. Although the review is by no means comprehensive, we hope that the discussion can inform the readers about the state-of-the-art in data-driven enzyme engineering, inspiring more joint experimental-computational efforts to develop and apply data-driven modeling to innovate biocatalysts for synthetic and pharmaceutical applications.
Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/metabolismo , Biocatálise , Catálise , Enzimas/genética , Enzimas/metabolismo , Engenharia de ProteínasRESUMO
Substrate positioning dynamics (SPD) orients the substrate in the active site, thereby influencing catalytic efficiency. However, it remains unknown whether SPD effects originate primarily from electrostatic perturbation inside the enzyme or can independently mediate catalysis with a significant non-electrostatic component. In this work, we investigated how the non-electrostatic component of SPD affects transition state (TS) stabilization. Using high-throughput enzyme modeling, we selected Kemp eliminase variants with similar electrostatics inside the enzyme but significantly different SPD. The kinetic parameters of these mutants were experimentally characterized. We observed a valley-shaped, two-segment linear correlation between the TS stabilization free energy (converted from kinetic parameters) and substrate positioning index (a metric to quantify SPD). The energy varies by approximately 2 kcal/mol. Favorable SPD was observed for the distal mutant R154W, increasing the proportion of reactive conformations and leading to the lowest activation free energy. These results indicate the substantial contribution of the non-electrostatic component of SPD to enzyme catalytic efficiency.
Assuntos
Eletricidade Estática , Termodinâmica , Catálise , Domínio CatalíticoRESUMO
Protein engineering holds immense promise in shaping the future of biomedicine and biotechnology. This Review focuses on our ongoing development of Mutexa, a computational ecosystem designed to enable "intelligent protein engineering". In this vision, researchers will seamlessly acquire sequences of protein variants with desired functions as biocatalysts, therapeutic peptides, and diagnostic proteins through a finely-tuned computational machine, akin to Amazon Alexa's role as a versatile virtual assistant. The technical foundation of Mutexa has been established through the development of a database that combines and relates enzyme structures and their respective functions (e.g., IntEnzyDB), workflow software packages that enable high-throughput protein modeling (e.g., EnzyHTP and LassoHTP), and scoring functions that map the sequence-structure-function relationship of proteins (e.g., EnzyKR and DeepLasso). We will showcase the applications of these tools in benchmarking the convergence conditions of enzyme functional descriptors across mutants, investigating protein electrostatics and cavity distributions in SAM-dependent methyltransferases, and understanding the role of nonelectrostatic dynamic effects in enzyme catalysis. Finally, we will conclude by addressing the future steps and fundamental challenges in our endeavor to develop new Mutexa applications that assist the identification of beneficial mutants in protein engineering.
Assuntos
Engenharia de Proteínas , ProteínasRESUMO
Molecular simulations have been extensively employed to accelerate biocatalytic discoveries. Enzyme functional descriptors derived from molecular simulations have been leveraged to guide the search for beneficial enzyme mutants. However, the ideal active-site region size for computing the descriptors over multiple enzyme variants remains untested. Here, we conducted convergence tests for dynamics-derived and electrostatic descriptors on 18 Kemp eliminase variants across six active-site regions with various boundary distances to the substrate. The tested descriptors include the root-mean-square deviation of the active-site region, the solvent accessible surface area ratio between the substrate and active site, and the projection of the electric field (EF) on the breaking C-H bond. All descriptors were evaluated using molecular mechanics methods. To understand the effects of electronic structure, the EF was also evaluated using quantum mechanics/molecular mechanics methods. The descriptor values were computed for 18 Kemp eliminase variants. Spearman correlation matrices were used to determine the region size condition under which further expansion of the region boundary does not substantially change the ranking of descriptor values. We observed that protein dynamics-derived descriptors, including RMSDactive_site and SASAratio, converge at a distance cutoff of 5 Å from the substrate. The electrostatic descriptor, EFC-H, converges at 6 Å using molecular mechanics methods with truncated enzyme models and 4 Å using quantum mechanics/molecular mechanics methods with whole enzyme model. This study serves as a future reference to determine descriptors for predictive modeling of enzyme engineering.
RESUMO
Molecular dynamics simulations have been extensively employed to reveal the roles of protein dynamics in mediating enzyme catalysis. However, simulation-derived predictive descriptors that inform the impacts of mutations on catalytic turnover numbers remain largely unexplored. In this work, we report the identification of molecular modeling-derived descriptors to predict mutation effect on the turnover number of lactonase SsoPox with both native and non-native substrates. The study consists of 10 enzyme-substrate complexes resulting from a combination of five enzyme variants with two substrates. For each complex, we derived 15 descriptors from molecular dynamics simulations and applied principal component analysis to rank the predictive capability of the descriptors. A top-ranked descriptor was identified, which is the solvent-accessible surface area (SASA) ratio of the substrate to the active site pocket. A uniform volcano-shaped plot was observed in the distribution of experimental activation free energy against the SASA ratio. To achieve efficient lactonase hydrolysis, a non-native substrate-bound enzyme variant needs to involve a similar range of the SASA ratio to the native substrate-bound wild-type enzyme. The descriptor reflects how well the enzyme active site pocket accommodates a substrate for reaction, which has the potential of guiding optimization of enzyme reaction turnover for non-native chemical transformations.
Assuntos
Domínio Catalítico , Simulação de Dinâmica Molecular , Mutação , Catálise , Domínio Catalítico/genética , Mutação/genética , Mutação/fisiologia , Especificidade por SubstratoRESUMO
Hydrolases are a critical component for modern chemical, pharmaceutical, and environmental sciences. Identifying mutations that enhance catalytic efficiency presents a roadblock to design and to discover new hydrolases for broad academic and industrial uses. Here, we report the statistical profiling for rate-perturbing mutant hydrolases with a single amino acid substitution. We constructed an integrated structure-kinetics database for hydrolases, IntEnzyDB, which contains 3907 kcats, 4175 KMs, and 2715 Protein Data Bank IDs. IntEnzyDB adopts a relational architecture with a flattened data structure, enabling facile and efficient access to clean and tabulated data for machine learning uses. We conducted statistical analyses on how single amino acids mutations influence the turnover number (i.e., kcat) and efficiency (i.e., kcat/KM), with a particular emphasis on profiling the features for rate-enhancing mutations. The results show that mutation to bulky nonpolar residues with a hydrocarbon chain involves a higher likelihood for rate acceleration than to other types of residues. Linear regression models reveal geometric descriptors of substrate and mutation residues that mediate rate-perturbing outcomes for hydrolases with bulky nonpolar mutations. On the basis of the analyses of the structure-kinetics relationship, we observe that the propensity for rate enhancement is independent of protein sizes. In addition, we observe that distal mutations (i.e., >10 Å from the active site) in hydrolases are significantly more prone to induce efficiency neutrality and avoid efficiency deletion but involve similar propensity for rate enhancement. The studies reveal the statistical features for identifying rate-enhancing mutations in hydrolases, which will potentially guide hydrolase discovery in biocatalysis.
Assuntos
Aminoácidos , Hidrolases , Substituição de Aminoácidos , Hidrolases/genética , Hidrolases/metabolismo , Cinética , Mutagênese Sítio-Dirigida , Mutação , Especificidade por SubstratoRESUMO
Vibrational spectroscopy provides a powerful tool to probe the structure and dynamics of nucleic acids because specific normal modes, particularly the base carbonyl stretch modes, are highly sensitive to the hydrogen bonding patterns and stacking configurations in these biomolecules. In this work, we develop vibrational frequency maps for the CâO and CâC stretches in nucleobases that allow the calculations of their site frequencies directly from molecular dynamics simulations. We assess the frequency maps by applying them to nucleobase derivatives in aqueous solutions and nucleosides in organic solvents and demonstrate that the predicted infrared spectra are in good agreement with experimental measurements. The frequency maps can be readily used to model the linear and nonlinear vibrational spectroscopy of nucleic acids and elucidate the molecular origin of the experimentally observed spectral features.