Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
1.
Acc Chem Res ; 54(4): 827-836, 2021 02 16.
Article in English | MEDLINE | ID: mdl-33534534

ABSTRACT

Machine-readable chemical structure representations are foundational in all attempts to harness machine learning for the prediction of reactivities, selectivities, and chemical properties directly from molecular structure. The featurization of discrete chemical structures into a continuous vector space is a critical phase undertaken before model selection, and the development of new ways to quantitatively encode molecules is an active area of research. In this Account, we highlight the application and suitability of different representations, from expert-guided "engineered" descriptors to automatically "learned" features, in different prediction tasks relevant to organic and organometallic chemistry, where differing amounts of training data are available. These tasks include statistical models of stereo- and enantioselectivity, thermochemistry, and kinetics developed using experimental and quantum chemical data.The use of expert-guided molecular descriptors provides an opportunity to incorporate chemical knowledge, domain expertise, and physical constraints into statistical modeling. In applications to stereoselective organic and organometallic catalysis, where data sets may be relatively small and 3D-geometries and conformations play an important role, mechanistically informed features can be used successfully to obtain predictive statistical models that are also chemically interpretable. We provide an overview of several recent applications of this approach to obtain quantitative models for reactivity and selectivity, where topological descriptors, quantum mechanical calculations of electronic and steric properties, along with conformational ensembles, all feature as essential ingredients of the molecular representations used.Alternatively, more flexible, general-purpose molecular representations such as attributed molecular graphs can be used with machine learning approaches to learn the complex relationship between a structure and prediction target. This approach has the potential to out-perform more traditional representation methods such as "hand-crafted" molecular descriptors, particularly as data set sizes grow. One area where this is particularly relevant is in the use of large sets of quantum mechanical data to train quantitative structure-property relationships. A general approach toward curating useful data sets and training highly accurate graph neural network models is discussed in the context of organic bond dissociation enthalpies, where this strategy outperforms regression using precomputed descriptors.Finally, we describe how graph neural network predictions can be incorporated into mechanistically informed statistical models of chemical reactivity and selectivity. Once trained, this approach avoids the expensive computational overhead associated with quantum mechanical calculations, while maintaining chemical interpretability. We illustrate examples for which fast predictions of bond dissociation enthalpy and of the identities of radicals formed through cleavage of a molecule's weakest bond are used in simple physical models of site-selectivity and reactivity.

2.
Proc Natl Acad Sci U S A ; 116(52): 26421-26430, 2019 Dec 26.
Article in English | MEDLINE | ID: mdl-31843899

ABSTRACT

Lignocellulosic biomass offers a renewable carbon source which can be anaerobically digested to produce short-chain carboxylic acids. Here, we assess fuel properties of oxygenates accessible from catalytic upgrading of these acids a priori for their potential to serve as diesel bioblendstocks. Ethers derived from C2 and C4 carboxylic acids are identified as advantaged fuel candidates with significantly improved ignition quality (>56% cetane number increase) and reduced sooting (>86% yield sooting index reduction) when compared to commercial petrodiesel. The prescreening process informed conversion pathway selection toward a C11 branched ether, 4-butoxyheptane, which showed promise for fuel performance and health- and safety-related attributes. A continuous, solvent-free production process was then developed using metal oxide acidic catalysts to provide improved thermal stability, water tolerance, and yields. Liter-scale production of 4-butoxyheptane enabled fuel property testing to confirm predicted fuel properties, while incorporation into petrodiesel at 20 vol % demonstrated 10% improvement in ignition quality and 20% reduction in intrinsic sooting tendency. Storage stability of the pure bioblendstock and 20 vol % blend was confirmed with a common fuel antioxidant, as was compatibility with elastomeric components within existing engine and fueling infrastructure. Technoeconomic analysis of the conversion process identified major cost drivers to guide further research and development. Life-cycle analysis determined the potential to reduce greenhouse gas emissions by 50 to 271% relative to petrodiesel, depending on treatment of coproducts.

3.
PLoS Comput Biol ; 15(11): e1007424, 2019 11.
Article in English | MEDLINE | ID: mdl-31682600

ABSTRACT

Modern biological tools generate a wealth of data on metabolite and protein concentrations that can be used to help inform new strain designs. However, learning from these data to predict how a cell will respond to genetic changes, a key need for engineering, remains challenging. A promising technique for leveraging omics measurements in metabolic modeling involves the construction of kinetic descriptions of the enzymatic reactions that occur within a cell. Parameterizing these models from biological data can be computationally difficult, since methods must also quantify the uncertainty in model parameters resulting from the observed data. While the field of Bayesian inference offers a wide range of methods for efficiently estimating distributions in parameter uncertainty, such techniques are poorly suited to traditional kinetic models due to their complex rate laws and resulting nonlinear dynamics. In this paper, we employ linear-logarithmic kinetics to simplify the calculation of steady-state flux distributions and enable efficient sampling and inference methods. We demonstrate that detailed information on the posterior distribution of parameters can be obtained efficiently at a variety of problem scales, including nearly genome-scale kinetic models trained on multiomics datasets. These results allow modern Bayesian machine learning tools to be leveraged in understanding biological data and in developing new, efficient strain designs.


Subject(s)
Enzymes/metabolism , Metabolism/physiology , Metabolomics/methods , Algorithms , Bayes Theorem , Genomics/methods , Kinetics , Machine Learning , Metabolic Engineering/statistics & numerical data , Models, Biological
4.
J Chem Phys ; 150(23): 234111, 2019 Jun 21.
Article in English | MEDLINE | ID: mdl-31228909

ABSTRACT

Machine learning methods have shown promise in predicting molecular properties, and given sufficient training data, machine learning approaches can enable rapid high-throughput virtual screening of large libraries of compounds. Graph-based neural network architectures have emerged in recent years as the most successful approach for predictions based on molecular structure and have consistently achieved the best performance on benchmark quantum chemical datasets. However, these models have typically required optimized 3D structural information for the molecule to achieve the highest accuracy. These 3D geometries are costly to compute for high levels of theory, limiting the applicability and practicality of machine learning methods in high-throughput screening applications. In this study, we present a new database of candidate molecules for organic photovoltaic applications, comprising approximately 91 000 unique chemical structures. Compared to existing datasets, this dataset contains substantially larger molecules (up to 200 atoms) as well as extrapolated properties for long polymer chains. We show that message-passing neural networks trained with and without 3D structural information for these molecules achieve similar accuracy, comparable to state-of-the-art methods on existing benchmark datasets. These results therefore emphasize that for larger molecules with practical applications, near-optimal prediction results can be obtained without using optimized 3D geometry as an input. We further show that learned molecular representations can be leveraged to reduce the training data required to transfer predictions to a new density functional theory functional.

5.
Proc Natl Acad Sci U S A ; 113(16): 4512-7, 2016 Apr 19.
Article in English | MEDLINE | ID: mdl-27044085

ABSTRACT

In the mammalian suprachiasmatic nucleus (SCN), noisy cellular oscillators communicate within a neuronal network to generate precise system-wide circadian rhythms. Although the intracellular genetic oscillator and intercellular biochemical coupling mechanisms have been examined previously, the network topology driving synchronization of the SCN has not been elucidated. This network has been particularly challenging to probe, due to its oscillatory components and slow coupling timescale. In this work, we investigated the SCN network at a single-cell resolution through a chemically induced desynchronization. We then inferred functional connections in the SCN by applying the maximal information coefficient statistic to bioluminescence reporter data from individual neurons while they resynchronized their circadian cycling. Our results demonstrate that the functional network of circadian cells associated with resynchronization has small-world characteristics, with a node degree distribution that is exponential. We show that hubs of this small-world network are preferentially located in the central SCN, with sparsely connected shells surrounding these cores. Finally, we used two computational models of circadian neurons to validate our predictions of network structure.


Subject(s)
Circadian Clocks/physiology , Nerve Net/metabolism , Suprachiasmatic Nucleus/metabolism , Animals , Genes, Reporter , Mice, Transgenic , Nerve Net/cytology , Suprachiasmatic Nucleus/cytology
6.
Appl Environ Microbiol ; 83(17)2017 09 01.
Article in English | MEDLINE | ID: mdl-28625987

ABSTRACT

Actinobacillus succinogenes, a Gram-negative facultative anaerobe, exhibits the native capacity to convert pentose and hexose sugars to succinic acid (SA) with high yield as a tricarboxylic acid (TCA) cycle intermediate. In addition, A. succinogenes is capnophilic, incorporating CO2 into SA, making this organism an ideal candidate host for conversion of lignocellulosic sugars and CO2 to an emerging commodity bioproduct sourced from renewable feedstocks. In this work, we report the development of facile metabolic engineering capabilities in A. succinogenes, enabling examination of SA flux determinants via knockout of the primary competing pathways-namely, acetate and formate production-and overexpression of the key enzymes in the reductive branch of the TCA cycle leading to SA. Batch fermentation experiments with the wild-type and engineered strains using pentose-rich sugar streams demonstrate that the overexpression of the SA biosynthetic machinery (in particular, the enzyme malate dehydrogenase) enhances flux to SA. Additionally, removal of competitive carbon pathways leads to higher-purity SA but also triggers the generation of by-products not previously described from this organism (e.g., lactic acid). The resultant engineered strains also lend insight into energetic and redox balance and elucidate mechanisms governing organic acid biosynthesis in this important natural SA-producing microbe.IMPORTANCE Succinic acid production from lignocellulosic residues is a potential route for enhancing the economic feasibility of modern biorefineries. Here, we employ facile genetic tools to systematically manipulate competing acid production pathways and overexpress the succinic acid-producing machinery in Actinobacillus succinogenes Furthermore, the resulting strains are evaluated via fermentation on relevant pentose-rich sugar streams representative of those from corn stover. Overall, this work demonstrates genetic modifications that can lead to succinic acid production improvements and identifies key flux determinants and new bottlenecks and energetic needs when removing by-product pathways in A. succinogenes metabolism.


Subject(s)
Actinobacillus/genetics , Actinobacillus/metabolism , Succinic Acid/metabolism , Bioreactors/microbiology , Fermentation , Formates/metabolism , Glucose/metabolism , Metabolic Engineering
7.
Proc Natl Acad Sci U S A ; 111(5): 2040-5, 2014 Feb 04.
Article in English | MEDLINE | ID: mdl-24449901

ABSTRACT

Posttranslational regulation of clock proteins is an essential part of mammalian circadian rhythms, conferring sensitivity to metabolic state and offering promising targets for pharmacological control. Two such regulators, casein kinase 1 (CKI) and F-box and leucine-rich repeat protein 3 (FBXL3), modulate the stability of closely linked core clock proteins period (PER) and cryptochrome (CRY), respectively. Inhibition of either CKI or FBXL3 leads to longer periods, and their effects are independent despite targeting proteins with similar roles in clock function. A mechanistic understanding of this independence, however, has remained elusive. Our analysis of cellular circadian clock gene reporters further differentiated between the actions of CKI and FBXL3 by revealing opposite amplitude responses from each manipulation. To understand the functional relationship between the CKI-PER and FBXL3-CRY pathways, we generated robust mechanistic predictions by applying a bootstrap uncertainty analysis to multiple mathematical circadian models. Our results indicate that CKI primarily regulates the accumulating phase of the PER-CRY repressive complex by controlling the nuclear import rate, whereas FBXL3 separately regulates the duration of transcriptional repression in the nucleus. Dynamic simulations confirmed that this spatiotemporal separation is able to reproduce the independence of the two regulators in period regulation, as well as their opposite amplitude effect. As a result, this study provides further insight into the molecular clock machinery responsible for maintaining robust circadian rhythms.


Subject(s)
Circadian Clocks , Cryptochromes/metabolism , Mammals/metabolism , Period Circadian Proteins/metabolism , Protein Processing, Post-Translational , Spatio-Temporal Analysis , Adenine/analogs & derivatives , Adenine/pharmacology , Animals , Carbazoles/pharmacology , Circadian Clocks/drug effects , F-Box Proteins/metabolism , Genes, Reporter , HEK293 Cells , Humans , Models, Biological , Protein Processing, Post-Translational/drug effects , Signal Transduction/drug effects , Sulfonamides/pharmacology , Time Factors
8.
PLoS Comput Biol ; 11(11): e1004451, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26588000

ABSTRACT

Stochastic noise at the cellular level has been shown to play a fundamental role in circadian oscillations, influencing how groups of cells entrain to external cues and likely serving as the mechanism by which cell-autonomous rhythms are generated. Despite this importance, few studies have investigated how clock perturbations affect stochastic noise-even as increasing numbers of high-throughput screens categorize how gene knockdowns or small molecules can change clock period and amplitude. This absence is likely due to the difficulty associated with measuring cell-autonomous stochastic noise directly, which currently requires the careful collection and processing of single-cell data. In this study, we show that the damping rate of population-level bioluminescence recordings can serve as an accurate measure of overall stochastic noise, and one that can be applied to future and existing high-throughput circadian screens. Using cell-autonomous fibroblast data, we first show directly that higher noise at the single-cell results in faster damping at the population level. Next, we show that the damping rate of cultured cells can be changed in a dose-dependent fashion by small molecule modulators, and confirm that such a change can be explained by single-cell noise using a mathematical model. We further demonstrate the insights that can be gained by applying our method to a genome-wide siRNA screen, revealing that stochastic noise is altered independently from period, amplitude, and phase. Finally, we hypothesize that the unperturbed clock is highly optimized for robust rhythms, as very few gene perturbations are capable of simultaneously increasing amplitude and lowering stochastic noise. Ultimately, this study demonstrates the importance of considering the effect of circadian perturbations on stochastic noise, particularly with regard to the development of small-molecule circadian therapeutics.


Subject(s)
Circadian Rhythm/physiology , Computational Biology/methods , Models, Biological , Animals , Cells, Cultured , Circadian Rhythm/drug effects , Computer Simulation , Gene Knockdown Techniques , Mice , RNA Interference/physiology , RNA, Small Interfering/pharmacology , Stochastic Processes
9.
Biophys J ; 107(11): 2712-22, 2014 Dec 02.
Article in English | MEDLINE | ID: mdl-25468350

ABSTRACT

Bioluminescence rhythms from cellular reporters have become the most common method used to quantify oscillations in circadian gene expression. These experimental systems can reveal phase and amplitude change resulting from circadian disturbances, and can be used in conjunction with mathematical models to lend further insight into the mechanistic basis of clock amplitude regulation. However, bioluminescence experiments track the mean output from thousands of noisy, uncoupled oscillators, obscuring the direct effect of a given stimulus on the genetic regulatory network. In many cases, it is unclear whether changes in amplitude are due to individual changes in gene expression level or to a change in coherence of the population. Although such systems can be modeled using explicit stochastic simulations, these models are computationally cumbersome and limit analytical insight into the mechanisms of amplitude change. We therefore develop theoretical and computational tools to approximate the mean expression level in large populations of noninteracting oscillators, and further define computationally efficient amplitude response calculations to describe phase-dependent amplitude change. At the single-cell level, a mechanistic nonlinear ordinary differential equation model is used to calculate the transient response of each cell to a perturbation, whereas population-level dynamics are captured by coupling this detailed model to a phase density function. Our analysis reveals that amplitude changes mediated at either the individual-cell or the population level can be distinguished in tissue-level bioluminescence data without the need for single-cell measurements. We demonstrate the effectiveness of the method by modeling experimental bioluminescence profiles of light-sensitive fibroblasts, reconciling the conclusions of two seemingly contradictory studies. This modeling framework allows a direct comparison between in vitro bioluminescence experiments and in silico ordinary differential equation models, and will lead to a better quantitative understanding of the factors that affect clock amplitude.


Subject(s)
Circadian Rhythm , Fibroblasts/metabolism , Genes, Reporter , Luminescent Measurements , Animals , Gene Expression Profiling , Mice , Models, Biological , NIH 3T3 Cells
10.
JACS Au ; 3(1): 113-123, 2023 Jan 23.
Article in English | MEDLINE | ID: mdl-36711088

ABSTRACT

The discovery of new materials in unexplored chemical spaces necessitates quick and accurate prediction of thermodynamic stability, often assessed using density functional theory (DFT), and efficient search strategies. Here, we develop a new approach to finding stable inorganic functional materials. We start by defining an upper bound to the fully relaxed energy obtained via DFT as the energy resulting from a constrained optimization over only cell volume. Because the fractional atomic coordinates for these calculations are known a priori, this upper bound energy can be quickly and accurately predicted with a scale-invariant graph neural network (GNN). We generate new structures via ionic substitution of known prototypes, and train our GNN on a new database of 128 000 DFT calculations comprising both fully relaxed and volume-only relaxed structures. By minimizing the predicted upper-bound energy, we discover new stable structures with over 99% accuracy (versus DFT). We demonstrate the method by finding promising new candidates for solid-state battery (SSB) electrolytes that not only possess the required stability, but also additional functional properties such as large electrochemical stability windows and high conduction ion fraction. We expect this proposed framework to be directly applicable to a wide range of design challenges in materials science.

11.
Macromolecules ; 56(21): 8547-8557, 2023 Nov 14.
Article in English | MEDLINE | ID: mdl-38024155

ABSTRACT

A necessary transformation for a sustainable economy is the transition from fossil-derived plastics to polymers derived from biomass and waste resources. While renewable feedstocks can enhance material performance through unique chemical moieties, probing the vast material design space by experiment alone is not practically feasible. Here, we develop a machine-learning-based tool, PolyID, to reduce the design space of renewable feedstocks to enable efficient discovery of performance-advantaged, biobased polymers. PolyID is a multioutput, graph neural network specifically designed to increase accuracy and to enable quantitative structure-property relationship (QSPR) analysis for polymers. It includes a novel domain-of-validity method that was developed and applied to demonstrate how gaps in training data can be filled to improve accuracy. The model was benchmarked with both a 20% held-out subset of the original training data and 22 experimentally synthesized polymers. A mean absolute error for the glass transition temperatures of 19.8 and 26.4 °C was achieved for the test and experimental data sets, respectively. Predictions were made on polymers composed of monomers from four databases that contain biologically accessible small molecules: MetaCyc, MINEs, KEGG, and BiGG. From 1.4 × 106 accessible biobased polymers, we identified five poly(ethylene terephthalate) (PET) analogues with predicted improvements to thermal and transport performance. Experimental validation for one of the PET analogues demonstrated a glass transition temperature between 85 and 112 °C, which is higher than PET and within the predicted range of the PolyID tool. In addition to accurate predictions, we show how the model's predictions are explainable through analysis of individual bond importance for a biobased nylon. Overall, PolyID can aid the biobased polymer practitioner to navigate the vast number of renewable polymers to discover sustainable materials with enhanced performance.

12.
Nat Commun ; 13(1): 4925, 2022 08 22.
Article in English | MEDLINE | ID: mdl-35995792

ABSTRACT

Muconic acid is a bioprivileged molecule that can be converted into direct replacement chemicals for incumbent petrochemicals and performance-advantaged bioproducts. In this study, Pseudomonas putida KT2440 is engineered to convert glucose and xylose, the primary carbohydrates in lignocellulosic hydrolysates, to muconic acid using a model-guided strategy to maximize the theoretical yield. Using adaptive laboratory evolution (ALE) and metabolic engineering in a strain engineered to express the D-xylose isomerase pathway, we demonstrate that mutations in the heterologous D-xylose:H+ symporter (XylE), increased expression of a major facilitator superfamily transporter (PP_2569), and overexpression of aroB encoding the native 3-dehydroquinate synthase, enable efficient muconic acid production from glucose and xylose simultaneously. Using the rationally engineered strain, we produce 33.7 g L-1 muconate at 0.18 g L-1 h-1 and a 46% molar yield (92% of the maximum theoretical yield). This engineering strategy is promising for the production of other shikimate pathway-derived compounds from lignocellulosic sugars.


Subject(s)
Pseudomonas putida , Xylose , Fermentation , Glucose/metabolism , Metabolic Engineering , Pseudomonas putida/genetics , Pseudomonas putida/metabolism , Sorbic Acid/analogs & derivatives , Xylose/metabolism
13.
Chem Sci ; 12(39): 13158-13166, 2021 Oct 13.
Article in English | MEDLINE | ID: mdl-34745547

ABSTRACT

Long-lived organic radicals are promising candidates for the development of high-performance energy solutions such as organic redox batteries, transistors, and light-emitting diodes. However, "stable" organic radicals that remain unreactive for an extended time and that can be stored and handled under ambient conditions are rare. A necessary but not sufficient condition for organic radical stability is the presence of thermodynamic stabilization, such as conjugation with an adjacent π-bond or lone-pair, or hyperconjugation with a σ-bond. However, thermodynamic factors alone do not result in radicals with extended lifetimes: many resonance-stabilized radicals are transient species that exist for less than a millisecond. Kinetic stabilization is also necessary for persistence, such as steric effects that inhibit radical dimerization or reaction with solvent molecules. We describe a quantitative approach to map organic radical stability, using molecular descriptors intended to capture thermodynamic and kinetic considerations. The comparison of an extensive dataset of quantum chemical calculations of organic radicals with experimentally-known stable radical species reveals a region of this feature space where long-lived radicals are located. These descriptors, based upon maximum spin density and buried volume, are combined into a single metric, the radical stability score, that outperforms thermodynamic scales based on bond dissociation enthalpies in identifying remarkably long-lived radicals. This provides an objective and accessible metric for use in future molecular design and optimization campaigns. We demonstrate this approach in identifying Pareto-optimal candidates for stable organic radicals.

14.
Front Bioeng Biotechnol ; 9: 707749, 2021.
Article in English | MEDLINE | ID: mdl-34381766

ABSTRACT

Prior engineering of the ethanologen Zymomonas mobilis has enabled it to metabolize xylose and to produce 2,3-butanediol (2,3-BDO) as a dominant fermentation product. When co-fermenting with xylose, glucose is preferentially utilized, even though xylose metabolism generates ATP more efficiently during 2,3-BDO production on a BDO-mol basis. To gain a deeper understanding of Z. mobilis metabolism, we first estimated the kinetic parameters of the glucose facilitator protein of Z. mobilis by fitting a kinetic uptake model, which shows that the maximum transport capacity of glucose is seven times higher than that of xylose, and glucose is six times more affinitive to the transporter than xylose. With these estimated kinetic parameters, we further compared the thermodynamic driving force and enzyme protein cost of glucose and xylose metabolism. It is found that, although 20% more ATP can be yielded stoichiometrically during xylose utilization, glucose metabolism is thermodynamically more favorable with 6% greater cumulative Gibbs free energy change, more economical with 37% less enzyme cost required at the initial stage and sustains the advantage of the thermodynamic driving force and protein cost through the fermentation process until glucose is exhausted. Glucose-6-phosphate dehydrogenase (g6pdh), glyceraldehyde-3-phosphate dehydrogenase (gapdh) and phosphoglycerate mutase (pgm) are identified as thermodynamic bottlenecks in glucose utilization pathway, as well as two more enzymes of xylose isomerase and ribulose-5-phosphate epimerase in xylose metabolism. Acetolactate synthase is found as potential engineering target for optimized protein cost supporting unit metabolic flux. Pathway analysis was then extended to the core stoichiometric matrix of Z. mobilis metabolism. Growth was simulated by dynamic flux balance analysis and the model was validated showing good agreement with experimental data. Dynamic FBA simulations suggest that a high agitation is preferable to increase 2,3-BDO productivity while a moderate agitation will benefit the 2,3-BDO titer. Taken together, this work provides thermodynamic and kinetic insights of Z. mobilis metabolism on dual substrates, and guidance of bioengineering efforts to increase hydrocarbon fuel production.

15.
Comput Struct Biotechnol J ; 19: 214-225, 2021.
Article in English | MEDLINE | ID: mdl-33425253

ABSTRACT

Microorganisms rely on protein interactions to transmit signals, react to stimuli, and grow. One of the best ways to understand these protein interactions is through structural characterization. However, in the past, structural knowledge was limited to stable, high-affinity complexes that could be crystallized. Recent developments in structural biology have revolutionized how protein interactions are characterized. The combination of multiple techniques, known as integrative structural biology, has provided insight into how large protein complexes interact in their native environment. In this mini-review, we describe the past, present, and potential future of integrative structural biology as a tool for characterizing protein interactions in their cellular context.

16.
Chem Sci ; 12(36): 12012-12026, 2021 Sep 22.
Article in English | MEDLINE | ID: mdl-34667567

ABSTRACT

Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict 1H and 13C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed 13C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution.

17.
ACS Synth Biol ; 10(11): 2968-2981, 2021 11 19.
Article in English | MEDLINE | ID: mdl-34636549

ABSTRACT

Optimizing the metabolism of microbial cell factories for yields and titers is a critical step for economically viable production of bioproducts and biofuels. In this process, tuning the expression of individual enzymes to obtain the desired pathway flux is a challenging step, in which data from separate multiomics techniques must be integrated with existing biological knowledge to determine where changes should be made. Following a design-build-test-learn strategy, building on recent advances in Bayesian metabolic control analysis, we identify key enzymes in the oleaginous yeast Yarrowia lipolytica that correlate with the production of itaconate by integrating a metabolic model with multiomics measurements. To this extent, we quantify the uncertainty for a variety of key parameters, known as flux control coefficients (FCCs), needed to improve the bioproduction of target metabolites and statistically obtain key correlations between the measured enzymes and boundary flux. Based on the top five significant FCCs and five correlated enzymes, our results show phosphoglycerate mutase, acetyl-CoA synthetase (ACSm), carbonic anhydrase (HCO3E), pyrophosphatase (PPAm), and homoserine dehydrogenase (HSDxi) enzymes in rate-limiting reactions that can lead to increased itaconic acid production.


Subject(s)
Yarrowia/metabolism , Acetate-CoA Ligase/metabolism , Acetyl Coenzyme A/metabolism , Bayes Theorem , Biofuels/microbiology , Carbonic Anhydrases/metabolism , Homoserine Dehydrogenase/metabolism , Metabolic Engineering/methods , Pyrophosphatases/metabolism
18.
Methods Mol Biol ; 2096: 165-177, 2020.
Article in English | MEDLINE | ID: mdl-32720154

ABSTRACT

As genetic engineering of organisms has grown easier and more precise, computational modeling of metabolic systems has played an increasingly important role in both guiding experimental interventions and in understanding the results of metabolic perturbations.


Subject(s)
Metabolic Flux Analysis/methods , Software , Escherichia coli/drug effects , Escherichia coli/growth & development , Glucose/metabolism , Metabolic Networks and Pathways/drug effects , Models, Biological , Oxygen/pharmacology , Phenotype , Sorbic Acid/analogs & derivatives , Sorbic Acid/metabolism
19.
Nat Commun ; 11(1): 3066, 2020 Jun 11.
Article in English | MEDLINE | ID: mdl-32528011

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

20.
Nat Commun ; 11(1): 2328, 2020 05 11.
Article in English | MEDLINE | ID: mdl-32393773

ABSTRACT

Bond dissociation enthalpies (BDEs) of organic molecules play a fundamental role in determining chemical reactivity and selectivity. However, BDE computations at sufficiently high levels of quantum mechanical theory require substantial computing resources. In this paper, we develop a machine learning model capable of accurately predicting BDEs for organic molecules in a fraction of a second. We perform automated density functional theory (DFT) calculations at the M06-2X/def2-TZVP level of theory for 42,577 small organic molecules, resulting in 290,664 BDEs. A graph neural network trained on a subset of these results achieves a mean absolute error of 0.58 kcal mol-1 (vs DFT) for BDEs of unseen molecules. We further demonstrate the model on two applications: first, we rapidly and accurately predict major sites of hydrogen abstraction in the metabolism of drug-like molecules, and second, we determine the dominant molecular fragmentation pathways during soot formation.

SELECTION OF CITATIONS
SEARCH DETAIL