Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Synth Biol (Oxf) ; 8(1): ysad005, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37073283

RESUMEN

Computational tools addressing various components of design-build-test-learn (DBTL) loops for the construction of synthetic genetic networks exist but do not generally cover the entire DBTL loop. This manuscript introduces an end-to-end sequence of tools that together form a DBTL loop called Design Assemble Round Trip (DART). DART provides rational selection and refinement of genetic parts to construct and test a circuit. Computational support for experimental process, metadata management, standardized data collection and reproducible data analysis is provided via the previously published Round Trip (RT) test-learn loop. The primary focus of this work is on the Design Assemble (DA) part of the tool chain, which improves on previous techniques by screening up to thousands of network topologies for robust performance using a novel robustness score derived from dynamical behavior based on circuit topology only. In addition, novel experimental support software is introduced for the assembly of genetic circuits. A complete design-through-analysis sequence is presented using several OR and NOR circuit designs, with and without structural redundancy, that are implemented in budding yeast. The execution of DART tested the predictions of the design tools, specifically with regard to robust and reproducible performance under different experimental conditions. The data analysis depended on a novel application of machine learning techniques to segment bimodal flow cytometry distributions. Evidence is presented that, in some cases, a more complex build may impart more robustness and reproducibility across experimental conditions. Graphical Abstract.

2.
PLoS One ; 17(3): e0265020, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35286324

RESUMEN

Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Secuencia de Aminoácidos , Aminoácidos , Estabilidad Proteica , Proteínas/química
3.
Bioinformatics ; 38(2): 404-409, 2022 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-34570169

RESUMEN

MOTIVATION: Applications in synthetic and systems biology can benefit from measuring whole-cell response to biochemical perturbations. Execution of experiments to cover all possible combinations of perturbations is infeasible. In this paper, we present the host response model (HRM), a machine learning approach that maps response of single perturbations to transcriptional response of the combination of perturbations. RESULTS: The HRM combines high-throughput sequencing with machine learning to infer links between experimental context, prior knowledge of cell regulatory networks, and RNASeq data to predict a gene's dysregulation. We find that the HRM can predict the directionality of dysregulation to a combination of inducers with an accuracy of >90% using data from single inducers. We further find that the use of prior, known cell regulatory networks doubles the predictive performance of the HRM (an R2 from 0.3 to 0.65). The model was validated in two organisms, Escherichia coli and Bacillus subtilis, using new experiments conducted after training. Finally, while the HRM is trained with gene expression data, the direct prediction of differential expression makes it possible to also conduct enrichment analyses using its predictions. We show that the HRM can accurately classify >95% of the pathway regulations. The HRM reduces the number of RNASeq experiments needed as responses can be tested in silico prior to the experiment. AVAILABILITY AND IMPLEMENTATION: The HRM software and tutorial are available at https://github.com/sd2e/CDM and the configurable differential expression analysis tools and tutorials are available at https://github.com/SD2E/omics_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Programas Informáticos , Biología de Sistemas , Escherichia coli/genética , Secuenciación de Nucleótidos de Alto Rendimiento
4.
J Chem Inf Model ; 61(4): 1593-1602, 2021 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-33797887

RESUMEN

Combinatorial fusion analysis (CFA) is an approach for combining multiple scoring systems using the rank-score characteristic function and cognitive diversity measure. One example is to combine diverse machine learning models to achieve better prediction quality. In this work, we apply CFA to the synthesis of metal halide perovskites containing organic ammonium cations via inverse temperature crystallization. Using a data set generated by high-throughput experimentation, four individual models (support vector machines, random forests, weighted logistic classifier, and gradient boosted trees) were developed. We characterize each of these scoring systems and explore 66 possible combinations of the models. When measured by the precision on predicting crystal formation, the majority of the combination models improves the individual model results. The best combination models outperform the best individual models by 3.9 percentage points in precision. In addition to improving prediction quality, we demonstrate how the fusion models can be used to identify mislabeled input data and address issues of data quality. In particular, we identify example cases where all single models and all fusion models do not give the correct prediction. Experimental replication of these syntheses reveals that these compositions are sensitive to modest temperature variations across the different locations of the heating element that can hinder or enhance the crystallization process. In summary, we demonstrate that model fusion using CFA can not only identify a previously unconsidered influence on reaction outcome but also be used as a form of quality control for high-throughput experimentation.


Asunto(s)
Aprendizaje Automático , Máquina de Vectores de Soporte , Compuestos de Calcio , Óxidos , Titanio
5.
J Phys Chem B ; 125(12): 3057-3065, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33739115

RESUMEN

Predicting protein stability is a challenge due to the many competing thermodynamic effects. Through de novo protein design, one begins with a target structure and searches for a sequence that will fold into it. Previous work by Rocklin et al. introduced a data set of more than 16,000 miniproteins spanning four structural topologies with information on stability. These structures were characterized with a set of 46 structural descriptors, with no explicit inclusion of configurational entropy (Scnf). Our work focused on creating a set of 17 descriptors intended to capture variations in Scnf and its comparison to an extended set of 113 structural and energy model features that extend the Rocklin et al. feature set (R). The Scnf descriptors statistically discriminate between stable and unstable distributions within topologies and best describe EEHEE topology stability (where E = ß sheet and H = α helix). Between 50 and 80% of the variation in each Scnf descriptor is described by linear combinations of R features. Despite containing useful information about minipeptide stability, providing Scnf features as inputs to machine learning models does not improve overall performance when predicting protein stability, as the R features sufficiently capture the implicit variations.


Asunto(s)
Proteínas , Entropía , Termodinámica
6.
J Phys Chem A ; 117(51): 14184-90, 2013 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-24283380

RESUMEN

The Cambridge Structural Database (CSD) was used to obtain flattening factors to describe the overall anisotropy of nonbonding van der Waals (vdW) contacts between several main group elements. The method for obtaining the flattening factors is based on a novel minimization process. Results show that the vdW contact distances are significantly dependent on the environment and the orientations of the surrounding covalently bonded atoms: head-on vdW contacts are generally shorter than sideways contacts in overall agreement with earlier results by Nyburg and Faerman (Acta Crystallogr., Sect. B: Struct. Sci. 1985, 41, 274-279). With the exception of Se, we find flattening factors that are somewhat smaller than those found earlier. High-level ab initio quantum chemical calculations using Ar and Ne as a probe also confirm the flattening effect and its dependency on the environment. A dozen popular long-range corrected and dispersion supplemented density functionals are compared with the CCSD(T) data. While several of them perform quite poorly, four DFT-D methods, especially B3LYP-GD3BJ, provided vdW flattening similar to those found by the CCSD(T) theory and experiment.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...