Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Chem Phys ; 159(15)2023 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-37861121

RESUMO

Data-driven interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with IP development. This deficiency precludes the research community from implementing widespread benchmarking, which is essential for gaining insight into model performance and transferability, and also limits the development of more general, or even universal, IPs. To address this issue, we introduce the ColabFit Exchange, the first database providing open access to a large collection of systematically organized datasets from multiple domains that is especially designed for IP development. The ColabFit Exchange is publicly available at https://colabfit.org, providing a web-based interface for exploring, downloading, and contributing datasets. Composed of data collected from the literature or provided by community researchers, the ColabFit Exchange currently (September 2023) consists of 139 datasets spanning nearly 70 000 unique chemistries, and is intended to continuously grow. In addition to outlining the software framework used for constructing and accessing the ColabFit Exchange, we also provide analyses of the data, quantifying the diversity of the database and proposing metrics for assessing the relative diversity of multiple datasets. Finally, we demonstrate an end-to-end IP development pipeline, utilizing datasets from the ColabFit Exchange, fitting tools from the KLIFF software package, and validation tests provided by the OpenKIM framework.

2.
ACS Synth Biol ; 12(9): 2600-2615, 2023 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-37642646

RESUMO

Engineered proteins have emerged as novel diagnostics, therapeutics, and catalysts. Often, poor protein developability─quantified by expression, solubility, and stability─hinders utility. The ability to predict protein developability from amino acid sequence would reduce the experimental burden when selecting candidates. Recent advances in screening technologies enabled a high-throughput (HT) developability dataset for 105 of 1020 possible variants of protein ligand scaffold Gp2. In this work, we evaluate the ability of neural networks to learn a developability representation from a HT dataset and transfer this knowledge to predict recombinant expression beyond observed sequences. The model convolves learned amino acid properties to predict expression levels 44% closer to the experimental variance compared to a non-embedded control. Analysis of learned amino acid embeddings highlights the uniqueness of cysteine, the importance of hydrophobicity and charge, and the unimportance of aromaticity, when aiming to improve the developability of small proteins. We identify clusters of similar sequences with increased recombinant expression through nonlinear dimensionality reduction and we explore the inferred expression landscape via nested sampling. The analysis enables the first direct visualization of the fitness landscape and highlights the existence of evolutionary bottlenecks in sequence space giving rise to competing subpopulations of sequences with different developability. The work advances applied protein engineering efforts by predicting and interpreting protein scaffold expression from a limited dataset. Furthermore, our statistical mechanical treatment of the problem advances foundational efforts to characterize the structure of the protein fitness landscape and the amino acid characteristics that influence protein developability.


Assuntos
Aminoácidos , Cisteína , Sequência de Aminoácidos , Redes Neurais de Computação , Engenharia de Proteínas
3.
J Chem Phys ; 158(4): 044901, 2023 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-36725501

RESUMO

We show that an analogy between crowding in fluid and jammed phases of hard spheres captures the density dependence of the kissing number for a family of numerically generated jammed states. We extend this analogy to jams of mixtures of hard spheres in d = 3 dimensions and, thus, obtain an estimate of the random close packing volume fraction, ϕRCP, as a function of size polydispersity. We first consider mixtures of particle sizes with discrete distributions. For binary systems, we show agreement between our predictions and simulations using both our own results and results reported in previous studies, as well as agreement with recent experiments from the literature. We then apply our approach to systems with continuous polydispersity using three different particle size distributions, namely, the log-normal, Gamma, and truncated power-law distributions. In all cases, we observe agreement between our theoretical findings and numerical results up to rather large polydispersities for all particle size distributions when using as reference our own simulations and results from the literature. In particular, we find ϕRCP to increase monotonically with the relative standard deviation, sσ, of the distribution and to saturate at a value that always remains below 1. A perturbative expansion yields a closed-form expression for ϕRCP that quantitatively captures a distribution-independent regime for sσ < 0.5. Beyond that regime, we show that the gradual loss in agreement is tied to the growth of the skewness of size distributions.

4.
ArXiv ; 2023 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-38235067

RESUMO

Stochasticity plays a central role in nearly every biological process, and the noise power spectral density (PSD) is a critical tool for understanding variability and information processing in living systems. In steady-state, many such processes can be described by stochastic linear time-invariant (LTI) systems driven by Gaussian white noise, whose PSD is a complex rational function of the frequency that can be concisely expressed in terms of their Jacobian, dispersion, and diffusion matrices, fully defining the statistical properties of the system's dynamics at steady-state. Here, we arrive at compact element-wise solutions of the rational function coefficients for the auto- and cross-spectrum that enable the explicit analytical computation of the PSD in dimensions n=2,3,4. We further present a recursive Leverrier-Faddeev-type algorithm for the exact computation of the rational function coefficients. Crucially, both solutions are free of matrix inverses. We illustrate our element-wise and recursive solutions by considering the stochastic dynamics of neural systems models, namely Fitzhugh-Nagumo (n=2), Hindmarsh-Rose (n=3), Wilson-Cowan (n=4), and the Stabilized Supralinear Network (n=22), as well as an evolutionary game-theoretic model with mutations (n=5, 31).

5.
Phys Rev Lett ; 129(22): 220601, 2022 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-36493452

RESUMO

Time-reversal symmetry breaking and entropy production are universal features of nonequilibrium phenomena. Despite its importance in the physics of active and living systems, the entropy production of systems with many degrees of freedom has remained of little practical significance because the high dimensionality of their state space makes it difficult to measure. Here we introduce a local measure of entropy production and a numerical protocol to estimate it. We establish a connection between the entropy production and extractability of work in a given region of the system and show how this quantity depends crucially on the degrees of freedom being tracked. We validate our approach in theory, simulation, and experiments by considering systems of active Brownian particles undergoing motility-induced phase separation, as well as active Brownian particles and E.coli in a rectifying device in which the time-reversal asymmetry of the particle dynamics couples to spatial asymmetry to reveal its effects on a macroscopic scale.


Assuntos
Física , Entropia , Simulação por Computador , Física/métodos
6.
Proc Natl Acad Sci U S A ; 118(23)2021 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-34078670

RESUMO

Proteins require high developability-quantified by expression, solubility, and stability-for robust utility as therapeutics, diagnostics, and in other biotechnological applications. Measuring traditional developability metrics is low throughput in nature, often slowing the developmental pipeline. We evaluated the ability of 10 variations of three high-throughput developability assays to predict the bacterial recombinant expression of paratope variants of the protein scaffold Gp2. Enabled by a phenotype/genotype linkage, assay performance for 105 variants was calculated via deep sequencing of populations sorted by proxied developability. We identified the most informative assay combination via cross-validation accuracy and correlation feature selection and demonstrated the ability of machine learning models to exploit nonlinear mutual information to increase the assays' predictive utility. We trained a random forest model that predicts expression from assay performance that is 35% closer to the experimental variance and trains 80% more efficiently than a model predicting from sequence information alone. Utilizing the predicted expression, we performed a site-wise analysis and predicted mutations consistent with enhanced developability. The validated assays offer the ability to identify developable proteins at unprecedented scales, reducing the bottleneck of protein commercialization.


Assuntos
Bases de Dados de Ácidos Nucleicos , Biblioteca Gênica , Ensaios de Triagem em Larga Escala , Aprendizado de Máquina , Proteínas/genética
7.
Phys Rev Lett ; 125(17): 170601, 2020 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-33156672

RESUMO

Computable information density (CID), the ratio of the length of a losslessly compressed data file to that of the uncompressed file, is a measure of order and correlation in both equilibrium and nonequilibrium systems. Here we show that correlation lengths can be obtained by decimation, thinning a configuration by sampling data at increasing intervals and recalculating the CID. When the sampling interval is larger than the system's correlation length, the data becomes incompressible. The correlation length and its critical exponents are thus accessible with no a priori knowledge of an order parameter or even the nature of the ordering. The correlation length measured in this way agrees well with that computed from the decay of two-point correlation functions g_{2}(r) when they exist. But the CID reveals the correlation length and its scaling even when g_{2}(r) has no structure, as we demonstrate by "cloaking" the data with a Rudin-Shapiro sequence.

8.
Proc Natl Acad Sci U S A ; 114(27): 6924-6929, 2017 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-28634292

RESUMO

Conventional Monte Carlo simulations are stochastic in the sense that the acceptance of a trial move is decided by comparing a computed acceptance probability with a random number, uniformly distributed between 0 and 1. Here, we consider the case that the weight determining the acceptance probability itself is fluctuating. This situation is common in many numerical studies. We show that it is possible to construct a rigorous Monte Carlo algorithm that visits points in state space with a probability proportional to their average weight. The same approach may have applications for certain classes of high-throughput experiments and the analysis of noisy datasets.

9.
Phys Chem Chem Phys ; 19(20): 12585-12603, 2017 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-28367548

RESUMO

Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to gain new insight into the solution space involved in training and the nature of the corresponding predictions. In particular, we can define quantities analogous to molecular structure, thermodynamics, and kinetics, and relate these emergent properties to the structure of the underlying landscape. This Perspective aims to describe these analogies with examples from recent applications, and suggest avenues for new interdisciplinary research.

10.
Phys Rev E ; 94(3-1): 031301, 2016 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-27739758

RESUMO

We propose an efficient Monte Carlo method for the computation of the volumes of high-dimensional bodies with arbitrary shape. We start with a region of known volume within the interior of the manifold and then use the multistate Bennett acceptance-ratio method to compute the dimensionless free-energy difference between a series of equilibrium simulations performed within this object. The method produces results that are in excellent agreement with thermodynamic integration, as well as a direct estimate of the associated statistical uncertainties. The histogram method also allows us to directly obtain an estimate of the interior radial probability density profile, thus yielding useful insight into the structural properties of such a high-dimensional body. We illustrate the method by analyzing the effect of structural disorder on the basins of attraction of mechanically stable packings of soft repulsive spheres.

11.
Phys Rev E ; 93(1): 012906, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26871142

RESUMO

We present a numerical calculation of the total number of disordered jammed configurations Ω of N repulsive, three-dimensional spheres in a fixed volume V. To make these calculations tractable, we increase the computational efficiency of the approach of Xu et al. [Phys. Rev. Lett. 106, 245502 (2011)10.1103/PhysRevLett.106.245502] and Asenjo et al. [Phys. Rev. Lett. 112, 098002 (2014)10.1103/PhysRevLett.112.098002] and we extend the method to allow computation of the configurational entropy as a function of pressure. The approach that we use computes the configurational entropy by sampling the absolute volume of basins of attraction of the stable packings in the potential energy landscape. We find a surprisingly strong correlation between the pressure of a configuration and the volume of its basin of attraction in the potential energy landscape. This relation is well described by a power law. Our methodology to compute the number of minima in the potential energy landscape should be applicable to a wide range of other enumeration problems in statistical physics, string theory, cosmology, and machine learning that aim to find the distribution of the extrema of a scalar cost function that depends on many degrees of freedom.

12.
Chem Commun (Camb) ; 48(18): 2406-8, 2012 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-22274136

RESUMO

The order of regeneration for DSCs based on two organic dyes has been investigated by transient absorption spectroscopy on devices under operating conditions and determined to be 2nd order in iodide. The results shed light on the mechanism and limits to the regeneration rate relative to oxidation potential.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...