Search | VHL Regional Portal

Quantifying Unbiased Conformational Ensembles from Biased Simulations Using ShapeGMM.

Sasmal, Subarna; Pal, Triasha; Hocky, Glen M; McCullagh, Martin.

J Chem Theory Comput ; 20(9): 3492-3502, 2024 May 14.

Article in English | MEDLINE | ID: mdl-38662196

ABSTRACT

Quantifying the conformational ensembles of biomolecules is fundamental to describing mechanisms of processes such as protein folding, interconversion between folded states, ligand binding, and allosteric regulation. Accurate quantification of these ensembles remains a challenge for conventional molecular simulations of all but the simplest molecules due to insufficient sampling. Enhanced sampling approaches, such as metadynamics, were designed to overcome this challenge; however, the nonuniform frame weights that result from many of these approaches present an additional challenge to ensemble quantification techniques such as Markov State Modeling or structural clustering. Here, we present rigorous inclusion of nonuniform frame weights into a structural clustering method entitled shapeGMM. The result of frame-weighted shapeGMM is a high dimensional probability density and generative model for the unbiased system from which we can compute important thermodynamic properties such as relative free energies and configurational entropy. The accuracy of this approach is demonstrated by the quantitative agreement between GMMs computed by Hamiltonian reweighting and direct simulation of a coarse-grained helix model system. Furthermore, the relative free energy computed from a shapeGMM probability density of alanine dipeptide reweighted from a metadynamics simulation quantitatively reproduces the underlying free energy in the basins. Finally, the method identifies hidden structures along the actin globular to filamentous-like structural transition from a metadynamics simulation on a linear discriminant analysis coordinate trained on GMM states, illustrating how structural clustering of biased data can lead to biophysical insight. Combined, these results demonstrate that frame-weighted shapeGMM is a powerful approach to quantifying biomolecular ensembles from biased simulations.

Subject(s)

Molecular Dynamics Simulation , Thermodynamics , Dipeptides/chemistry , Protein Conformation , Protein Folding

Reaction Coordinates for Conformational Transitions Using Linear Discriminant Analysis on Positions.

Sasmal, Subarna; McCullagh, Martin; Hocky, Glen M.

J Chem Theory Comput ; 19(14): 4427-4435, 2023 Jul 25.

Article in English | MEDLINE | ID: mdl-37130367

ABSTRACT

In this work, we demonstrate that Linear Discriminant Analysis (LDA) applied to atomic positions in two different states of a biomolecule produces a good reaction coordinate between those two states. Atomic coordinates of a macromolecule are a direct representation of a macromolecular configuration, and yet, they are not used in enhanced sampling studies due to a lack of rotational and translational invariance. We resolve this issue using the technique of our prior work, whereby a molecular configuration is considered a member of an equivalence class in size-and-shape space, which is the set of all configurations that can be translated and rotated to a single point within a reference multivariate Gaussian distribution characterizing a single molecular state. The reaction coordinates produced by LDA applied to positions are shown to be good reaction coordinates both in terms of characterizing the transition between two states of a system within a long molecular dynamics (MD) simulation and also ones that allow us to readily produce free energy estimates along that reaction coordinate using enhanced sampling MD techniques.

Assessment of chemistry knowledge in large language models that generate code.

White, Andrew D; Hocky, Glen M; Gandhi, Heta A; Ansari, Mehrad; Cox, Sam; Wellawatte, Geemi P; Sasmal, Subarna; Yang, Ziyue; Liu, Kangxin; Singh, Yuvraj; Peña Ccoa, Willmor J.

Digit Discov ; 2(2): 368-376, 2023 Apr 11.

Article in English | MEDLINE | ID: mdl-37065678

ABSTRACT

In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed as coding tasks. To do so, we produce a benchmark set of problems, and evaluate these models based on correctness of code by automated testing and evaluation by experts. We find that recent LLMs are able to write correct code across a variety of topics in chemistry and their accuracy can be increased by 30 percentage points via prompt engineering strategies, like putting copyright notices at the top of files. Our dataset and evaluation tools are open source which can be contributed to or built upon by future researchers, and will serve as a community resource for evaluating the performance of new models as they emerge. We also describe some good practices for employing LLMs in chemistry. The general success of these models demonstrates that their impact on chemistry teaching and research is poised to be enormous.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL