RESUMEN
MOTIVATION: Applications in synthetic and systems biology can benefit from measuring whole-cell response to biochemical perturbations. Execution of experiments to cover all possible combinations of perturbations is infeasible. In this paper, we present the host response model (HRM), a machine learning approach that maps response of single perturbations to transcriptional response of the combination of perturbations. RESULTS: The HRM combines high-throughput sequencing with machine learning to infer links between experimental context, prior knowledge of cell regulatory networks, and RNASeq data to predict a gene's dysregulation. We find that the HRM can predict the directionality of dysregulation to a combination of inducers with an accuracy of >90% using data from single inducers. We further find that the use of prior, known cell regulatory networks doubles the predictive performance of the HRM (an R2 from 0.3 to 0.65). The model was validated in two organisms, Escherichia coli and Bacillus subtilis, using new experiments conducted after training. Finally, while the HRM is trained with gene expression data, the direct prediction of differential expression makes it possible to also conduct enrichment analyses using its predictions. We show that the HRM can accurately classify >95% of the pathway regulations. The HRM reduces the number of RNASeq experiments needed as responses can be tested in silico prior to the experiment. AVAILABILITY AND IMPLEMENTATION: The HRM software and tutorial are available at https://github.com/sd2e/CDM and the configurable differential expression analysis tools and tutorials are available at https://github.com/SD2E/omics_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Aprendizaje Automático , Programas Informáticos , Biología de Sistemas , Escherichia coli/genética , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
Centralized facilities for genetic engineering, or "biofoundries", offer the potential to design organisms to address emerging needs in medicine, agriculture, industry, and defense. The field has seen rapid advances in technology, but it is difficult to gauge current capabilities or identify gaps across projects. To this end, our foundry was assessed via a timed "pressure test", in which 3 months were given to build organisms to produce 10 molecules unknown to us in advance. By applying a diversity of new approaches, we produced the desired molecule or a closely related one for six out of 10 targets during the performance period and made advances toward production of the others as well. Specifically, we increased the titers of 1-hexadecanol, pyrrolnitrin, and pacidamycin D, found novel routes to the enediyne warhead underlying powerful antimicrobials, established a cell-free system for monoterpene production, produced an intermediate toward vincristine biosynthesis, and encoded 7802 individually retrievable pathways to 540 bisindoles in a DNA pool. Pathways to tetrahydrofuran and barbamide were designed and constructed, but toxicity or analytical tools inhibited further progress. In sum, we constructed 1.2 Mb DNA, built 215 strains spanning five species ( Saccharomyces cerevisiae, Escherichia coli, Streptomyces albidoflavus, Streptomyces coelicolor, and Streptomyces albovinaceus), established two cell-free systems, and performed 690 assays developed in-house for the molecules.
Asunto(s)
Escherichia coli/genética , Ingeniería Genética , Saccharomyces cerevisiae/genética , Streptomyces/genética , Aminoglicósidos/biosíntesis , Aminoglicósidos/química , Carbazoles/química , Carbazoles/metabolismo , Biología Computacional , Monoterpenos Ciclohexánicos , Enediinos/química , Escherichia coli/metabolismo , Alcoholes Grasos/química , Alcoholes Grasos/metabolismo , Furanos/química , Furanos/metabolismo , Lactonas/química , Lactonas/metabolismo , Estructura Molecular , Monoterpenos/química , Monoterpenos/metabolismo , Péptidos/química , Presión , Nucleósidos de Pirimidina/biosíntesis , Nucleósidos de Pirimidina/química , Pirrolnitrina/biosíntesis , Pirrolnitrina/química , Saccharomyces cerevisiae/metabolismo , Streptomyces/metabolismo , Tiazoles/química , Tiazoles/metabolismo , Factores de Tiempo , Vincristina/biosíntesis , Vincristina/químicaRESUMEN
Metabolic network models describing growth of Escherichia coli on glucose, glycerol and acetate were derived from a genome scale model of E. coli. One of the uncertainties in the metabolic networks is the exact stoichiometry of energy generating and consuming processes. Accurate estimation of biomass and product yields requires correct information on the ATP stoichiometry. The unknown ATP stoichiometry parameters of the constructed E. coli network were estimated from experimental data of eight different aerobic chemostat experiments carried out with E. coli MG1655, grown at different dilution rates (0.025, 0.05, 0.1, and 0.3 h(-1)) and on different carbon substrates (glucose, glycerol, and acetate). Proper estimation of the ATP stoichiometry requires proper information on the biomass composition of the organism as well as accurate assessment of net conversion rates under well-defined conditions. For this purpose a growth rate dependent biomass composition was derived, based on measurements and literature data. After incorporation of the growth rate dependent biomass composition in a metabolic network model, an effective P/O ratio of 1.49 +/- 0.26 mol of ATP/mol of O, K(X) (growth dependent maintenance) of 0.46 +/- 0.27 mol of ATP/C-mol of biomass and m(ATP) (growth independent maintenance) of 0.075 +/- 0.015 mol of ATP/C-mol of biomass/h were estimated using a newly developed Comprehensive Data Reconciliation (CDR) method, assuming that the three energetic parameters were independent of the growth rate and the used substrate. The resulting metabolic network model only requires the specific rate of growth, micro, as an input in order to accurately predict all other fluxes and yields.
Asunto(s)
Adenosina Trifosfato/metabolismo , Metabolismo Energético/genética , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Genoma Bacteriano , Ácido Acético/metabolismo , Biomasa , Glucosa/metabolismo , Glicerol/metabolismo , Modelos BiológicosRESUMEN
Multiple input changes can cause unwanted switching variations, or glitches, in the output of genetic combinational circuits. These glitches can have drastic effects if the output of the circuit causes irreversible changes within or with other cells such as a cascade of responses, apoptosis, or the release of a pharmaceutical in an off-target tissue. Therefore, avoiding unwanted variation of a circuit's output can be crucial for the safe operation of a genetic circuit. This paper investigates what causes unwanted switching variations in combinational genetic circuits using hazard analysis and a new dynamic model generator. The analysis is done in previously built and modeled genetic circuits with known glitching behavior. The dynamic models generated not only predict the same steady states as previous models but can also predict the unwanted switching variations that have been observed experimentally. Multiple input changes may cause glitches due to propagation delays within the circuit. Modifying the circuit's layout to alter these delays may change the likelihood of certain glitches, but it cannot eliminate the possibility that the glitch may occur. In other words, function hazards cannot be eliminated. Instead, they must be avoided by restricting the allowed input changes to the system. Logic hazards, on the other hand, can be avoided using hazard-free logic synthesis. This paper demonstrates this by showing how a circuit designed using a popular genetic design automation tool can be redesigned to eliminate logic hazards.