Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Proc Natl Acad Sci U S A ; 118(23)2021 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-34078670

RESUMEN

Proteins require high developability-quantified by expression, solubility, and stability-for robust utility as therapeutics, diagnostics, and in other biotechnological applications. Measuring traditional developability metrics is low throughput in nature, often slowing the developmental pipeline. We evaluated the ability of 10 variations of three high-throughput developability assays to predict the bacterial recombinant expression of paratope variants of the protein scaffold Gp2. Enabled by a phenotype/genotype linkage, assay performance for 105 variants was calculated via deep sequencing of populations sorted by proxied developability. We identified the most informative assay combination via cross-validation accuracy and correlation feature selection and demonstrated the ability of machine learning models to exploit nonlinear mutual information to increase the assays' predictive utility. We trained a random forest model that predicts expression from assay performance that is 35% closer to the experimental variance and trains 80% more efficiently than a model predicting from sequence information alone. Utilizing the predicted expression, we performed a site-wise analysis and predicted mutations consistent with enhanced developability. The validated assays offer the ability to identify developable proteins at unprecedented scales, reducing the bottleneck of protein commercialization.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Biblioteca de Genes , Ensayos Analíticos de Alto Rendimiento , Aprendizaje Automático , Proteínas/genética
2.
J Chem Phys ; 158(4): 044901, 2023 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-36725501

RESUMEN

We show that an analogy between crowding in fluid and jammed phases of hard spheres captures the density dependence of the kissing number for a family of numerically generated jammed states. We extend this analogy to jams of mixtures of hard spheres in d = 3 dimensions and, thus, obtain an estimate of the random close packing volume fraction, ϕRCP, as a function of size polydispersity. We first consider mixtures of particle sizes with discrete distributions. For binary systems, we show agreement between our predictions and simulations using both our own results and results reported in previous studies, as well as agreement with recent experiments from the literature. We then apply our approach to systems with continuous polydispersity using three different particle size distributions, namely, the log-normal, Gamma, and truncated power-law distributions. In all cases, we observe agreement between our theoretical findings and numerical results up to rather large polydispersities for all particle size distributions when using as reference our own simulations and results from the literature. In particular, we find ϕRCP to increase monotonically with the relative standard deviation, sσ, of the distribution and to saturate at a value that always remains below 1. A perturbative expansion yields a closed-form expression for ϕRCP that quantitatively captures a distribution-independent regime for sσ < 0.5. Beyond that regime, we show that the gradual loss in agreement is tied to the growth of the skewness of size distributions.

3.
J Chem Phys ; 159(15)2023 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-37861121

RESUMEN

Data-driven interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with IP development. This deficiency precludes the research community from implementing widespread benchmarking, which is essential for gaining insight into model performance and transferability, and also limits the development of more general, or even universal, IPs. To address this issue, we introduce the ColabFit Exchange, the first database providing open access to a large collection of systematically organized datasets from multiple domains that is especially designed for IP development. The ColabFit Exchange is publicly available at https://colabfit.org, providing a web-based interface for exploring, downloading, and contributing datasets. Composed of data collected from the literature or provided by community researchers, the ColabFit Exchange currently (September 2023) consists of 139 datasets spanning nearly 70 000 unique chemistries, and is intended to continuously grow. In addition to outlining the software framework used for constructing and accessing the ColabFit Exchange, we also provide analyses of the data, quantifying the diversity of the database and proposing metrics for assessing the relative diversity of multiple datasets. Finally, we demonstrate an end-to-end IP development pipeline, utilizing datasets from the ColabFit Exchange, fitting tools from the KLIFF software package, and validation tests provided by the OpenKIM framework.

4.
Phys Rev Lett ; 129(22): 220601, 2022 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-36493452

RESUMEN

Time-reversal symmetry breaking and entropy production are universal features of nonequilibrium phenomena. Despite its importance in the physics of active and living systems, the entropy production of systems with many degrees of freedom has remained of little practical significance because the high dimensionality of their state space makes it difficult to measure. Here we introduce a local measure of entropy production and a numerical protocol to estimate it. We establish a connection between the entropy production and extractability of work in a given region of the system and show how this quantity depends crucially on the degrees of freedom being tracked. We validate our approach in theory, simulation, and experiments by considering systems of active Brownian particles undergoing motility-induced phase separation, as well as active Brownian particles and E.coli in a rectifying device in which the time-reversal asymmetry of the particle dynamics couples to spatial asymmetry to reveal its effects on a macroscopic scale.


Asunto(s)
Física , Entropía , Simulación por Computador , Física/métodos
5.
Phys Rev Lett ; 125(17): 170601, 2020 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-33156672

RESUMEN

Computable information density (CID), the ratio of the length of a losslessly compressed data file to that of the uncompressed file, is a measure of order and correlation in both equilibrium and nonequilibrium systems. Here we show that correlation lengths can be obtained by decimation, thinning a configuration by sampling data at increasing intervals and recalculating the CID. When the sampling interval is larger than the system's correlation length, the data becomes incompressible. The correlation length and its critical exponents are thus accessible with no a priori knowledge of an order parameter or even the nature of the ordering. The correlation length measured in this way agrees well with that computed from the decay of two-point correlation functions g_{2}(r) when they exist. But the CID reveals the correlation length and its scaling even when g_{2}(r) has no structure, as we demonstrate by "cloaking" the data with a Rudin-Shapiro sequence.

6.
Proc Natl Acad Sci U S A ; 114(27): 6924-6929, 2017 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-28634292

RESUMEN

Conventional Monte Carlo simulations are stochastic in the sense that the acceptance of a trial move is decided by comparing a computed acceptance probability with a random number, uniformly distributed between 0 and 1. Here, we consider the case that the weight determining the acceptance probability itself is fluctuating. This situation is common in many numerical studies. We show that it is possible to construct a rigorous Monte Carlo algorithm that visits points in state space with a probability proportional to their average weight. The same approach may have applications for certain classes of high-throughput experiments and the analysis of noisy datasets.

7.
Phys Chem Chem Phys ; 19(20): 12585-12603, 2017 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-28367548

RESUMEN

Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to gain new insight into the solution space involved in training and the nature of the corresponding predictions. In particular, we can define quantities analogous to molecular structure, thermodynamics, and kinetics, and relate these emergent properties to the structure of the underlying landscape. This Perspective aims to describe these analogies with examples from recent applications, and suggest avenues for new interdisciplinary research.

8.
ArXiv ; 2024 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-38235067

RESUMEN

Stochasticity plays a central role in nearly every biological process, and the noise power spectral density (PSD) is a critical tool for understanding variability and information processing in living systems. In steady-state, many such processes can be described by stochastic linear time-invariant (LTI) systems driven by Gaussian white noise, whose PSD is a complex rational function of the frequency that can be concisely expressed in terms of their Jacobian, dispersion, and diffusion matrices, fully defining the statistical properties of the system's dynamics at steady-state. Here, we arrive at compact element-wise solutions of the rational function coefficients for the auto- and cross-spectrum that enable the explicit analytical computation of the PSD in dimensions n=2,3,4. We further present a recursive Leverrier-Faddeev-type algorithm for the exact computation of the rational function coefficients. Crucially, both solutions are free of matrix inverses. We illustrate our element-wise and recursive solutions by considering the stochastic dynamics of neural systems models, namely Fitzhugh-Nagumo (n=2), Hindmarsh-Rose (n=3), Wilson-Cowan (n=4), and the Stabilized Supralinear Network (n=22), as well as an evolutionary game-theoretic model with mutations (n=5, 31). We extend our approach to derive a recursive method for calculating the coefficients in the power series expansion of the integrated covariance matrix for interacting spiking neurons modeled as Hawkes processes on arbitrary directed graphs.

9.
Phys Rev E ; 110(3-1): 034122, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39425405

RESUMEN

Media with correlated disorder display unexpected transport properties, but it is still a challenge to design structures with desired spectral features at scale. In this work, we introduce an optimal formulation of this inverse problem by means of the nonuniform fast Fourier transform, thus arriving at an algorithm capable of generating systems with arbitrary spectral properties, with a computational cost that scales O(NlogN) with system size. The method is extended to accommodate arbitrary real-space interactions, such as short-range repulsion, to simultaneously control short- and long-range correlations. We thus generate the largest-ever stealthy hyperuniform configurations in 2d (N=10^{9}) and 3d (N>10^{7}) and demonstrate the flexibility of the approach by generating structures with designed spectral features at scale. By an Ewald sphere construction we link the spectral and optical properties at the single-scattering level and show that stealthy hyperuniform structures generically display transmission gaps, providing a concrete example of fine-tuning of a physical property. We also show that large 3d power-law hyperuniformity in particle packings leads to single-scattering properties nearly identical to those of simple hard spheres. Finally, we demonstrate generalizations of the approach to impose features in either continuous or discrete real space, using constraints in either continuous or discrete reciprocal space. In particular, enforcing large spectral power at peaks with the right symmetry leads to the nondeterministic generation of quasicrystalline structures in 2d and 3d. This technique should become an essential tool to embed, and understand the role of, long-range correlations in disordered metamaterials.

10.
ArXiv ; 2024 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-39398197

RESUMEN

Stability in recurrent neural models poses a significant challenge, particularly in developing biologically plausible neurodynamical models that can be seamlessly trained. Traditional cortical circuit models are notoriously difficult to train due to expansive nonlinearities in the dynamical system, leading to an optimization problem with nonlinear stability constraints that are difficult to impose. Conversely, recurrent neural networks (RNNs) excel in tasks involving sequential data but lack biological plausibility and interpretability. In this work, we address these challenges by linking dynamic divisive normalization (DN) to the stability of ORGaNICs, a biologically plausible recurrent cortical circuit model that dynamically achieves DN and that has been shown to simulate a wide range of neurophysiological phenomena. By using the indirect method of Lyapunov, we prove the remarkable property of unconditional local stability for an arbitrary-dimensional ORGaNICs circuit when the recurrent weight matrix is the identity. We thus connect ORGaNICs to a system of coupled damped harmonic oscillators, which enables us to derive the circuit's energy function, providing a normative principle of what the circuit, and individual neurons, aim to accomplish. Further, for a generic recurrent weight matrix, we prove the stability of the 2D model and demonstrate empirically that stability holds in higher dimensions. Finally, we show that ORGaNICs can be trained by backpropagation through time without gradient clipping/scaling, thanks to its intrinsic stability property and adaptive time constants, which address the problems of exploding, vanishing, and oscillating gradients. By evaluating the model's performance on RNN benchmarks, we find that ORGaNICs outperform alternative neurodynamical models on static image classification tasks and perform comparably to LSTMs on sequential tasks.

11.
ACS Synth Biol ; 12(9): 2600-2615, 2023 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-37642646

RESUMEN

Engineered proteins have emerged as novel diagnostics, therapeutics, and catalysts. Often, poor protein developability─quantified by expression, solubility, and stability─hinders utility. The ability to predict protein developability from amino acid sequence would reduce the experimental burden when selecting candidates. Recent advances in screening technologies enabled a high-throughput (HT) developability dataset for 105 of 1020 possible variants of protein ligand scaffold Gp2. In this work, we evaluate the ability of neural networks to learn a developability representation from a HT dataset and transfer this knowledge to predict recombinant expression beyond observed sequences. The model convolves learned amino acid properties to predict expression levels 44% closer to the experimental variance compared to a non-embedded control. Analysis of learned amino acid embeddings highlights the uniqueness of cysteine, the importance of hydrophobicity and charge, and the unimportance of aromaticity, when aiming to improve the developability of small proteins. We identify clusters of similar sequences with increased recombinant expression through nonlinear dimensionality reduction and we explore the inferred expression landscape via nested sampling. The analysis enables the first direct visualization of the fitness landscape and highlights the existence of evolutionary bottlenecks in sequence space giving rise to competing subpopulations of sequences with different developability. The work advances applied protein engineering efforts by predicting and interpreting protein scaffold expression from a limited dataset. Furthermore, our statistical mechanical treatment of the problem advances foundational efforts to characterize the structure of the protein fitness landscape and the amino acid characteristics that influence protein developability.


Asunto(s)
Aminoácidos , Cisteína , Secuencia de Aminoácidos , Redes Neurales de la Computación , Ingeniería de Proteínas
12.
Phys Rev E ; 94(3-1): 031301, 2016 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-27739758

RESUMEN

We propose an efficient Monte Carlo method for the computation of the volumes of high-dimensional bodies with arbitrary shape. We start with a region of known volume within the interior of the manifold and then use the multistate Bennett acceptance-ratio method to compute the dimensionless free-energy difference between a series of equilibrium simulations performed within this object. The method produces results that are in excellent agreement with thermodynamic integration, as well as a direct estimate of the associated statistical uncertainties. The histogram method also allows us to directly obtain an estimate of the interior radial probability density profile, thus yielding useful insight into the structural properties of such a high-dimensional body. We illustrate the method by analyzing the effect of structural disorder on the basins of attraction of mechanically stable packings of soft repulsive spheres.

13.
Phys Rev E ; 93(1): 012906, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26871142

RESUMEN

We present a numerical calculation of the total number of disordered jammed configurations Ω of N repulsive, three-dimensional spheres in a fixed volume V. To make these calculations tractable, we increase the computational efficiency of the approach of Xu et al. [Phys. Rev. Lett. 106, 245502 (2011)10.1103/PhysRevLett.106.245502] and Asenjo et al. [Phys. Rev. Lett. 112, 098002 (2014)10.1103/PhysRevLett.112.098002] and we extend the method to allow computation of the configurational entropy as a function of pressure. The approach that we use computes the configurational entropy by sampling the absolute volume of basins of attraction of the stable packings in the potential energy landscape. We find a surprisingly strong correlation between the pressure of a configuration and the volume of its basin of attraction in the potential energy landscape. This relation is well described by a power law. Our methodology to compute the number of minima in the potential energy landscape should be applicable to a wide range of other enumeration problems in statistical physics, string theory, cosmology, and machine learning that aim to find the distribution of the extrema of a scalar cost function that depends on many degrees of freedom.

14.
Chem Commun (Camb) ; 48(18): 2406-8, 2012 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-22274136

RESUMEN

The order of regeneration for DSCs based on two organic dyes has been investigated by transient absorption spectroscopy on devices under operating conditions and determined to be 2nd order in iodide. The results shed light on the mechanism and limits to the regeneration rate relative to oxidation potential.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA