Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Acc Chem Res ; 54(4): 827-836, 2021 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-33534534

RESUMO

Machine-readable chemical structure representations are foundational in all attempts to harness machine learning for the prediction of reactivities, selectivities, and chemical properties directly from molecular structure. The featurization of discrete chemical structures into a continuous vector space is a critical phase undertaken before model selection, and the development of new ways to quantitatively encode molecules is an active area of research. In this Account, we highlight the application and suitability of different representations, from expert-guided "engineered" descriptors to automatically "learned" features, in different prediction tasks relevant to organic and organometallic chemistry, where differing amounts of training data are available. These tasks include statistical models of stereo- and enantioselectivity, thermochemistry, and kinetics developed using experimental and quantum chemical data.The use of expert-guided molecular descriptors provides an opportunity to incorporate chemical knowledge, domain expertise, and physical constraints into statistical modeling. In applications to stereoselective organic and organometallic catalysis, where data sets may be relatively small and 3D-geometries and conformations play an important role, mechanistically informed features can be used successfully to obtain predictive statistical models that are also chemically interpretable. We provide an overview of several recent applications of this approach to obtain quantitative models for reactivity and selectivity, where topological descriptors, quantum mechanical calculations of electronic and steric properties, along with conformational ensembles, all feature as essential ingredients of the molecular representations used.Alternatively, more flexible, general-purpose molecular representations such as attributed molecular graphs can be used with machine learning approaches to learn the complex relationship between a structure and prediction target. This approach has the potential to out-perform more traditional representation methods such as "hand-crafted" molecular descriptors, particularly as data set sizes grow. One area where this is particularly relevant is in the use of large sets of quantum mechanical data to train quantitative structure-property relationships. A general approach toward curating useful data sets and training highly accurate graph neural network models is discussed in the context of organic bond dissociation enthalpies, where this strategy outperforms regression using precomputed descriptors.Finally, we describe how graph neural network predictions can be incorporated into mechanistically informed statistical models of chemical reactivity and selectivity. Once trained, this approach avoids the expensive computational overhead associated with quantum mechanical calculations, while maintaining chemical interpretability. We illustrate examples for which fast predictions of bond dissociation enthalpy and of the identities of radicals formed through cleavage of a molecule's weakest bond are used in simple physical models of site-selectivity and reactivity.

2.
Proc Natl Acad Sci U S A ; 116(52): 26421-26430, 2019 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-31843899

RESUMO

Lignocellulosic biomass offers a renewable carbon source which can be anaerobically digested to produce short-chain carboxylic acids. Here, we assess fuel properties of oxygenates accessible from catalytic upgrading of these acids a priori for their potential to serve as diesel bioblendstocks. Ethers derived from C2 and C4 carboxylic acids are identified as advantaged fuel candidates with significantly improved ignition quality (>56% cetane number increase) and reduced sooting (>86% yield sooting index reduction) when compared to commercial petrodiesel. The prescreening process informed conversion pathway selection toward a C11 branched ether, 4-butoxyheptane, which showed promise for fuel performance and health- and safety-related attributes. A continuous, solvent-free production process was then developed using metal oxide acidic catalysts to provide improved thermal stability, water tolerance, and yields. Liter-scale production of 4-butoxyheptane enabled fuel property testing to confirm predicted fuel properties, while incorporation into petrodiesel at 20 vol % demonstrated 10% improvement in ignition quality and 20% reduction in intrinsic sooting tendency. Storage stability of the pure bioblendstock and 20 vol % blend was confirmed with a common fuel antioxidant, as was compatibility with elastomeric components within existing engine and fueling infrastructure. Technoeconomic analysis of the conversion process identified major cost drivers to guide further research and development. Life-cycle analysis determined the potential to reduce greenhouse gas emissions by 50 to 271% relative to petrodiesel, depending on treatment of coproducts.

3.
PLoS Comput Biol ; 15(11): e1007424, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31682600

RESUMO

Modern biological tools generate a wealth of data on metabolite and protein concentrations that can be used to help inform new strain designs. However, learning from these data to predict how a cell will respond to genetic changes, a key need for engineering, remains challenging. A promising technique for leveraging omics measurements in metabolic modeling involves the construction of kinetic descriptions of the enzymatic reactions that occur within a cell. Parameterizing these models from biological data can be computationally difficult, since methods must also quantify the uncertainty in model parameters resulting from the observed data. While the field of Bayesian inference offers a wide range of methods for efficiently estimating distributions in parameter uncertainty, such techniques are poorly suited to traditional kinetic models due to their complex rate laws and resulting nonlinear dynamics. In this paper, we employ linear-logarithmic kinetics to simplify the calculation of steady-state flux distributions and enable efficient sampling and inference methods. We demonstrate that detailed information on the posterior distribution of parameters can be obtained efficiently at a variety of problem scales, including nearly genome-scale kinetic models trained on multiomics datasets. These results allow modern Bayesian machine learning tools to be leveraged in understanding biological data and in developing new, efficient strain designs.


Assuntos
Enzimas/metabolismo , Metabolismo/fisiologia , Metabolômica/métodos , Algoritmos , Teorema de Bayes , Genômica/métodos , Cinética , Aprendizado de Máquina , Engenharia Metabólica/estatística & dados numéricos , Modelos Biológicos
4.
J Chem Phys ; 150(23): 234111, 2019 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-31228909

RESUMO

Machine learning methods have shown promise in predicting molecular properties, and given sufficient training data, machine learning approaches can enable rapid high-throughput virtual screening of large libraries of compounds. Graph-based neural network architectures have emerged in recent years as the most successful approach for predictions based on molecular structure and have consistently achieved the best performance on benchmark quantum chemical datasets. However, these models have typically required optimized 3D structural information for the molecule to achieve the highest accuracy. These 3D geometries are costly to compute for high levels of theory, limiting the applicability and practicality of machine learning methods in high-throughput screening applications. In this study, we present a new database of candidate molecules for organic photovoltaic applications, comprising approximately 91 000 unique chemical structures. Compared to existing datasets, this dataset contains substantially larger molecules (up to 200 atoms) as well as extrapolated properties for long polymer chains. We show that message-passing neural networks trained with and without 3D structural information for these molecules achieve similar accuracy, comparable to state-of-the-art methods on existing benchmark datasets. These results therefore emphasize that for larger molecules with practical applications, near-optimal prediction results can be obtained without using optimized 3D geometry as an input. We further show that learned molecular representations can be leveraged to reduce the training data required to transfer predictions to a new density functional theory functional.

5.
Proc Natl Acad Sci U S A ; 113(16): 4512-7, 2016 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-27044085

RESUMO

In the mammalian suprachiasmatic nucleus (SCN), noisy cellular oscillators communicate within a neuronal network to generate precise system-wide circadian rhythms. Although the intracellular genetic oscillator and intercellular biochemical coupling mechanisms have been examined previously, the network topology driving synchronization of the SCN has not been elucidated. This network has been particularly challenging to probe, due to its oscillatory components and slow coupling timescale. In this work, we investigated the SCN network at a single-cell resolution through a chemically induced desynchronization. We then inferred functional connections in the SCN by applying the maximal information coefficient statistic to bioluminescence reporter data from individual neurons while they resynchronized their circadian cycling. Our results demonstrate that the functional network of circadian cells associated with resynchronization has small-world characteristics, with a node degree distribution that is exponential. We show that hubs of this small-world network are preferentially located in the central SCN, with sparsely connected shells surrounding these cores. Finally, we used two computational models of circadian neurons to validate our predictions of network structure.


Assuntos
Relógios Circadianos/fisiologia , Rede Nervosa/metabolismo , Núcleo Supraquiasmático/metabolismo , Animais , Genes Reporter , Camundongos Transgênicos , Rede Nervosa/citologia , Núcleo Supraquiasmático/citologia
6.
Appl Environ Microbiol ; 83(17)2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-28625987

RESUMO

Actinobacillus succinogenes, a Gram-negative facultative anaerobe, exhibits the native capacity to convert pentose and hexose sugars to succinic acid (SA) with high yield as a tricarboxylic acid (TCA) cycle intermediate. In addition, A. succinogenes is capnophilic, incorporating CO2 into SA, making this organism an ideal candidate host for conversion of lignocellulosic sugars and CO2 to an emerging commodity bioproduct sourced from renewable feedstocks. In this work, we report the development of facile metabolic engineering capabilities in A. succinogenes, enabling examination of SA flux determinants via knockout of the primary competing pathways-namely, acetate and formate production-and overexpression of the key enzymes in the reductive branch of the TCA cycle leading to SA. Batch fermentation experiments with the wild-type and engineered strains using pentose-rich sugar streams demonstrate that the overexpression of the SA biosynthetic machinery (in particular, the enzyme malate dehydrogenase) enhances flux to SA. Additionally, removal of competitive carbon pathways leads to higher-purity SA but also triggers the generation of by-products not previously described from this organism (e.g., lactic acid). The resultant engineered strains also lend insight into energetic and redox balance and elucidate mechanisms governing organic acid biosynthesis in this important natural SA-producing microbe.IMPORTANCE Succinic acid production from lignocellulosic residues is a potential route for enhancing the economic feasibility of modern biorefineries. Here, we employ facile genetic tools to systematically manipulate competing acid production pathways and overexpress the succinic acid-producing machinery in Actinobacillus succinogenes Furthermore, the resulting strains are evaluated via fermentation on relevant pentose-rich sugar streams representative of those from corn stover. Overall, this work demonstrates genetic modifications that can lead to succinic acid production improvements and identifies key flux determinants and new bottlenecks and energetic needs when removing by-product pathways in A. succinogenes metabolism.


Assuntos
Actinobacillus/genética , Actinobacillus/metabolismo , Ácido Succínico/metabolismo , Reatores Biológicos/microbiologia , Fermentação , Formiatos/metabolismo , Glucose/metabolismo , Engenharia Metabólica
7.
Proc Natl Acad Sci U S A ; 111(5): 2040-5, 2014 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-24449901

RESUMO

Posttranslational regulation of clock proteins is an essential part of mammalian circadian rhythms, conferring sensitivity to metabolic state and offering promising targets for pharmacological control. Two such regulators, casein kinase 1 (CKI) and F-box and leucine-rich repeat protein 3 (FBXL3), modulate the stability of closely linked core clock proteins period (PER) and cryptochrome (CRY), respectively. Inhibition of either CKI or FBXL3 leads to longer periods, and their effects are independent despite targeting proteins with similar roles in clock function. A mechanistic understanding of this independence, however, has remained elusive. Our analysis of cellular circadian clock gene reporters further differentiated between the actions of CKI and FBXL3 by revealing opposite amplitude responses from each manipulation. To understand the functional relationship between the CKI-PER and FBXL3-CRY pathways, we generated robust mechanistic predictions by applying a bootstrap uncertainty analysis to multiple mathematical circadian models. Our results indicate that CKI primarily regulates the accumulating phase of the PER-CRY repressive complex by controlling the nuclear import rate, whereas FBXL3 separately regulates the duration of transcriptional repression in the nucleus. Dynamic simulations confirmed that this spatiotemporal separation is able to reproduce the independence of the two regulators in period regulation, as well as their opposite amplitude effect. As a result, this study provides further insight into the molecular clock machinery responsible for maintaining robust circadian rhythms.


Assuntos
Relógios Circadianos , Criptocromos/metabolismo , Mamíferos/metabolismo , Proteínas Circadianas Period/metabolismo , Processamento de Proteína Pós-Traducional , Análise Espaço-Temporal , Adenina/análogos & derivados , Adenina/farmacologia , Animais , Carbazóis/farmacologia , Relógios Circadianos/efeitos dos fármacos , Proteínas F-Box/metabolismo , Genes Reporter , Células HEK293 , Humanos , Modelos Biológicos , Processamento de Proteína Pós-Traducional/efeitos dos fármacos , Transdução de Sinais/efeitos dos fármacos , Sulfonamidas/farmacologia , Fatores de Tempo
8.
PLoS Comput Biol ; 11(11): e1004451, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26588000

RESUMO

Stochastic noise at the cellular level has been shown to play a fundamental role in circadian oscillations, influencing how groups of cells entrain to external cues and likely serving as the mechanism by which cell-autonomous rhythms are generated. Despite this importance, few studies have investigated how clock perturbations affect stochastic noise-even as increasing numbers of high-throughput screens categorize how gene knockdowns or small molecules can change clock period and amplitude. This absence is likely due to the difficulty associated with measuring cell-autonomous stochastic noise directly, which currently requires the careful collection and processing of single-cell data. In this study, we show that the damping rate of population-level bioluminescence recordings can serve as an accurate measure of overall stochastic noise, and one that can be applied to future and existing high-throughput circadian screens. Using cell-autonomous fibroblast data, we first show directly that higher noise at the single-cell results in faster damping at the population level. Next, we show that the damping rate of cultured cells can be changed in a dose-dependent fashion by small molecule modulators, and confirm that such a change can be explained by single-cell noise using a mathematical model. We further demonstrate the insights that can be gained by applying our method to a genome-wide siRNA screen, revealing that stochastic noise is altered independently from period, amplitude, and phase. Finally, we hypothesize that the unperturbed clock is highly optimized for robust rhythms, as very few gene perturbations are capable of simultaneously increasing amplitude and lowering stochastic noise. Ultimately, this study demonstrates the importance of considering the effect of circadian perturbations on stochastic noise, particularly with regard to the development of small-molecule circadian therapeutics.


Assuntos
Ritmo Circadiano/fisiologia , Biologia Computacional/métodos , Modelos Biológicos , Animais , Células Cultivadas , Ritmo Circadiano/efeitos dos fármacos , Simulação por Computador , Técnicas de Silenciamento de Genes , Camundongos , Interferência de RNA/fisiologia , RNA Interferente Pequeno/farmacologia , Processos Estocásticos
9.
Biophys J ; 107(11): 2712-22, 2014 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-25468350

RESUMO

Bioluminescence rhythms from cellular reporters have become the most common method used to quantify oscillations in circadian gene expression. These experimental systems can reveal phase and amplitude change resulting from circadian disturbances, and can be used in conjunction with mathematical models to lend further insight into the mechanistic basis of clock amplitude regulation. However, bioluminescence experiments track the mean output from thousands of noisy, uncoupled oscillators, obscuring the direct effect of a given stimulus on the genetic regulatory network. In many cases, it is unclear whether changes in amplitude are due to individual changes in gene expression level or to a change in coherence of the population. Although such systems can be modeled using explicit stochastic simulations, these models are computationally cumbersome and limit analytical insight into the mechanisms of amplitude change. We therefore develop theoretical and computational tools to approximate the mean expression level in large populations of noninteracting oscillators, and further define computationally efficient amplitude response calculations to describe phase-dependent amplitude change. At the single-cell level, a mechanistic nonlinear ordinary differential equation model is used to calculate the transient response of each cell to a perturbation, whereas population-level dynamics are captured by coupling this detailed model to a phase density function. Our analysis reveals that amplitude changes mediated at either the individual-cell or the population level can be distinguished in tissue-level bioluminescence data without the need for single-cell measurements. We demonstrate the effectiveness of the method by modeling experimental bioluminescence profiles of light-sensitive fibroblasts, reconciling the conclusions of two seemingly contradictory studies. This modeling framework allows a direct comparison between in vitro bioluminescence experiments and in silico ordinary differential equation models, and will lead to a better quantitative understanding of the factors that affect clock amplitude.


Assuntos
Ritmo Circadiano , Fibroblastos/metabolismo , Genes Reporter , Medições Luminescentes , Animais , Perfilação da Expressão Gênica , Camundongos , Modelos Biológicos , Células NIH 3T3
10.
JACS Au ; 3(1): 113-123, 2023 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-36711088

RESUMO

The discovery of new materials in unexplored chemical spaces necessitates quick and accurate prediction of thermodynamic stability, often assessed using density functional theory (DFT), and efficient search strategies. Here, we develop a new approach to finding stable inorganic functional materials. We start by defining an upper bound to the fully relaxed energy obtained via DFT as the energy resulting from a constrained optimization over only cell volume. Because the fractional atomic coordinates for these calculations are known a priori, this upper bound energy can be quickly and accurately predicted with a scale-invariant graph neural network (GNN). We generate new structures via ionic substitution of known prototypes, and train our GNN on a new database of 128 000 DFT calculations comprising both fully relaxed and volume-only relaxed structures. By minimizing the predicted upper-bound energy, we discover new stable structures with over 99% accuracy (versus DFT). We demonstrate the method by finding promising new candidates for solid-state battery (SSB) electrolytes that not only possess the required stability, but also additional functional properties such as large electrochemical stability windows and high conduction ion fraction. We expect this proposed framework to be directly applicable to a wide range of design challenges in materials science.

11.
Macromolecules ; 56(21): 8547-8557, 2023 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-38024155

RESUMO

A necessary transformation for a sustainable economy is the transition from fossil-derived plastics to polymers derived from biomass and waste resources. While renewable feedstocks can enhance material performance through unique chemical moieties, probing the vast material design space by experiment alone is not practically feasible. Here, we develop a machine-learning-based tool, PolyID, to reduce the design space of renewable feedstocks to enable efficient discovery of performance-advantaged, biobased polymers. PolyID is a multioutput, graph neural network specifically designed to increase accuracy and to enable quantitative structure-property relationship (QSPR) analysis for polymers. It includes a novel domain-of-validity method that was developed and applied to demonstrate how gaps in training data can be filled to improve accuracy. The model was benchmarked with both a 20% held-out subset of the original training data and 22 experimentally synthesized polymers. A mean absolute error for the glass transition temperatures of 19.8 and 26.4 °C was achieved for the test and experimental data sets, respectively. Predictions were made on polymers composed of monomers from four databases that contain biologically accessible small molecules: MetaCyc, MINEs, KEGG, and BiGG. From 1.4 × 106 accessible biobased polymers, we identified five poly(ethylene terephthalate) (PET) analogues with predicted improvements to thermal and transport performance. Experimental validation for one of the PET analogues demonstrated a glass transition temperature between 85 and 112 °C, which is higher than PET and within the predicted range of the PolyID tool. In addition to accurate predictions, we show how the model's predictions are explainable through analysis of individual bond importance for a biobased nylon. Overall, PolyID can aid the biobased polymer practitioner to navigate the vast number of renewable polymers to discover sustainable materials with enhanced performance.

12.
Nat Commun ; 13(1): 4925, 2022 08 22.
Artigo em Inglês | MEDLINE | ID: mdl-35995792

RESUMO

Muconic acid is a bioprivileged molecule that can be converted into direct replacement chemicals for incumbent petrochemicals and performance-advantaged bioproducts. In this study, Pseudomonas putida KT2440 is engineered to convert glucose and xylose, the primary carbohydrates in lignocellulosic hydrolysates, to muconic acid using a model-guided strategy to maximize the theoretical yield. Using adaptive laboratory evolution (ALE) and metabolic engineering in a strain engineered to express the D-xylose isomerase pathway, we demonstrate that mutations in the heterologous D-xylose:H+ symporter (XylE), increased expression of a major facilitator superfamily transporter (PP_2569), and overexpression of aroB encoding the native 3-dehydroquinate synthase, enable efficient muconic acid production from glucose and xylose simultaneously. Using the rationally engineered strain, we produce 33.7 g L-1 muconate at 0.18 g L-1 h-1 and a 46% molar yield (92% of the maximum theoretical yield). This engineering strategy is promising for the production of other shikimate pathway-derived compounds from lignocellulosic sugars.


Assuntos
Pseudomonas putida , Xilose , Fermentação , Glucose/metabolismo , Engenharia Metabólica , Pseudomonas putida/genética , Pseudomonas putida/metabolismo , Ácido Sórbico/análogos & derivados , Xilose/metabolismo
13.
Chem Sci ; 12(39): 13158-13166, 2021 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-34745547

RESUMO

Long-lived organic radicals are promising candidates for the development of high-performance energy solutions such as organic redox batteries, transistors, and light-emitting diodes. However, "stable" organic radicals that remain unreactive for an extended time and that can be stored and handled under ambient conditions are rare. A necessary but not sufficient condition for organic radical stability is the presence of thermodynamic stabilization, such as conjugation with an adjacent π-bond or lone-pair, or hyperconjugation with a σ-bond. However, thermodynamic factors alone do not result in radicals with extended lifetimes: many resonance-stabilized radicals are transient species that exist for less than a millisecond. Kinetic stabilization is also necessary for persistence, such as steric effects that inhibit radical dimerization or reaction with solvent molecules. We describe a quantitative approach to map organic radical stability, using molecular descriptors intended to capture thermodynamic and kinetic considerations. The comparison of an extensive dataset of quantum chemical calculations of organic radicals with experimentally-known stable radical species reveals a region of this feature space where long-lived radicals are located. These descriptors, based upon maximum spin density and buried volume, are combined into a single metric, the radical stability score, that outperforms thermodynamic scales based on bond dissociation enthalpies in identifying remarkably long-lived radicals. This provides an objective and accessible metric for use in future molecular design and optimization campaigns. We demonstrate this approach in identifying Pareto-optimal candidates for stable organic radicals.

14.
Comput Struct Biotechnol J ; 19: 214-225, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33425253

RESUMO

Microorganisms rely on protein interactions to transmit signals, react to stimuli, and grow. One of the best ways to understand these protein interactions is through structural characterization. However, in the past, structural knowledge was limited to stable, high-affinity complexes that could be crystallized. Recent developments in structural biology have revolutionized how protein interactions are characterized. The combination of multiple techniques, known as integrative structural biology, has provided insight into how large protein complexes interact in their native environment. In this mini-review, we describe the past, present, and potential future of integrative structural biology as a tool for characterizing protein interactions in their cellular context.

15.
Front Bioeng Biotechnol ; 9: 707749, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34381766

RESUMO

Prior engineering of the ethanologen Zymomonas mobilis has enabled it to metabolize xylose and to produce 2,3-butanediol (2,3-BDO) as a dominant fermentation product. When co-fermenting with xylose, glucose is preferentially utilized, even though xylose metabolism generates ATP more efficiently during 2,3-BDO production on a BDO-mol basis. To gain a deeper understanding of Z. mobilis metabolism, we first estimated the kinetic parameters of the glucose facilitator protein of Z. mobilis by fitting a kinetic uptake model, which shows that the maximum transport capacity of glucose is seven times higher than that of xylose, and glucose is six times more affinitive to the transporter than xylose. With these estimated kinetic parameters, we further compared the thermodynamic driving force and enzyme protein cost of glucose and xylose metabolism. It is found that, although 20% more ATP can be yielded stoichiometrically during xylose utilization, glucose metabolism is thermodynamically more favorable with 6% greater cumulative Gibbs free energy change, more economical with 37% less enzyme cost required at the initial stage and sustains the advantage of the thermodynamic driving force and protein cost through the fermentation process until glucose is exhausted. Glucose-6-phosphate dehydrogenase (g6pdh), glyceraldehyde-3-phosphate dehydrogenase (gapdh) and phosphoglycerate mutase (pgm) are identified as thermodynamic bottlenecks in glucose utilization pathway, as well as two more enzymes of xylose isomerase and ribulose-5-phosphate epimerase in xylose metabolism. Acetolactate synthase is found as potential engineering target for optimized protein cost supporting unit metabolic flux. Pathway analysis was then extended to the core stoichiometric matrix of Z. mobilis metabolism. Growth was simulated by dynamic flux balance analysis and the model was validated showing good agreement with experimental data. Dynamic FBA simulations suggest that a high agitation is preferable to increase 2,3-BDO productivity while a moderate agitation will benefit the 2,3-BDO titer. Taken together, this work provides thermodynamic and kinetic insights of Z. mobilis metabolism on dual substrates, and guidance of bioengineering efforts to increase hydrocarbon fuel production.

16.
Chem Sci ; 12(36): 12012-12026, 2021 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-34667567

RESUMO

Nuclear magnetic resonance (NMR) is one of the primary techniques used to elucidate the chemical structure, bonding, stereochemistry, and conformation of organic compounds. The distinct chemical shifts in an NMR spectrum depend upon each atom's local chemical environment and are influenced by both through-bond and through-space interactions with other atoms and functional groups. The in silico prediction of NMR chemical shifts using quantum mechanical (QM) calculations is now commonplace in aiding organic structural assignment since spectra can be computed for several candidate structures and then compared with experimental values to find the best possible match. However, the computational demands of calculating multiple structural- and stereo-isomers, each of which may typically exist as an ensemble of rapidly-interconverting conformations, are expensive. Additionally, the QM predictions themselves may lack sufficient accuracy to identify a correct structure. In this work, we address both of these shortcomings by developing a rapid machine learning (ML) protocol to predict 1H and 13C chemical shifts through an efficient graph neural network (GNN) using 3D structures as input. Transfer learning with experimental data is used to improve the final prediction accuracy of a model trained using QM calculations. When tested on the CHESHIRE dataset, the proposed model predicts observed 13C chemical shifts with comparable accuracy to the best-performing DFT functionals (1.5 ppm) in around 1/6000 of the CPU time. An automated prediction webserver and graphical interface are accessible online at http://nova.chem.colostate.edu/cascade/. We further demonstrate the model in three applications: first, we use the model to decide the correct organic structure from candidates through experimental spectra, including complex stereoisomers; second, we automatically detect and revise incorrect chemical shift assignments in a popular NMR database, the NMRShiftDB; and third, we use NMR chemical shifts as descriptors for determination of the sites of electrophilic aromatic substitution.

17.
ACS Synth Biol ; 10(11): 2968-2981, 2021 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-34636549

RESUMO

Optimizing the metabolism of microbial cell factories for yields and titers is a critical step for economically viable production of bioproducts and biofuels. In this process, tuning the expression of individual enzymes to obtain the desired pathway flux is a challenging step, in which data from separate multiomics techniques must be integrated with existing biological knowledge to determine where changes should be made. Following a design-build-test-learn strategy, building on recent advances in Bayesian metabolic control analysis, we identify key enzymes in the oleaginous yeast Yarrowia lipolytica that correlate with the production of itaconate by integrating a metabolic model with multiomics measurements. To this extent, we quantify the uncertainty for a variety of key parameters, known as flux control coefficients (FCCs), needed to improve the bioproduction of target metabolites and statistically obtain key correlations between the measured enzymes and boundary flux. Based on the top five significant FCCs and five correlated enzymes, our results show phosphoglycerate mutase, acetyl-CoA synthetase (ACSm), carbonic anhydrase (HCO3E), pyrophosphatase (PPAm), and homoserine dehydrogenase (HSDxi) enzymes in rate-limiting reactions that can lead to increased itaconic acid production.


Assuntos
Yarrowia/metabolismo , Acetato-CoA Ligase/metabolismo , Acetilcoenzima A/metabolismo , Teorema de Bayes , Biocombustíveis/microbiologia , Anidrases Carbônicas/metabolismo , Homosserina Desidrogenase/metabolismo , Engenharia Metabólica/métodos , Pirofosfatases/metabolismo
18.
Methods Mol Biol ; 2096: 165-177, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32720154

RESUMO

As genetic engineering of organisms has grown easier and more precise, computational modeling of metabolic systems has played an increasingly important role in both guiding experimental interventions and in understanding the results of metabolic perturbations.


Assuntos
Análise do Fluxo Metabólico/métodos , Software , Escherichia coli/efeitos dos fármacos , Escherichia coli/crescimento & desenvolvimento , Glucose/metabolismo , Redes e Vias Metabólicas/efeitos dos fármacos , Modelos Biológicos , Oxigênio/farmacologia , Fenótipo , Ácido Sórbico/análogos & derivados , Ácido Sórbico/metabolismo
19.
Nat Commun ; 11(1): 2328, 2020 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-32393773

RESUMO

Bond dissociation enthalpies (BDEs) of organic molecules play a fundamental role in determining chemical reactivity and selectivity. However, BDE computations at sufficiently high levels of quantum mechanical theory require substantial computing resources. In this paper, we develop a machine learning model capable of accurately predicting BDEs for organic molecules in a fraction of a second. We perform automated density functional theory (DFT) calculations at the M06-2X/def2-TZVP level of theory for 42,577 small organic molecules, resulting in 290,664 BDEs. A graph neural network trained on a subset of these results achieves a mean absolute error of 0.58 kcal mol-1 (vs DFT) for BDEs of unseen molecules. We further demonstrate the model on two applications: first, we rapidly and accurately predict major sites of hydrogen abstraction in the metabolism of drug-like molecules, and second, we determine the dominant molecular fragmentation pathways during soot formation.

20.
Nat Commun ; 11(1): 3066, 2020 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-32528011

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA